Made famous by the Conficker worm, domain generation algorithms prevent cybercriminels from setting up a complex P2P network for a command and control server, without making it too easy for the security community to shut down the infrastructure by using IPs or domains hardcoded in the binary. The seed of these algorithms is often based on the date, but we remember of the good old Torpig which also used daily Twitter trends (a simple yet ingenious trick to prevent the registration of the domains before the current day). In this post, we are going to demonstrate a few methods to automate DGA domain generation, by taking the example of the small Locky ransomware.

Method #1: manual transcription

Here is the network trace of a Locky execution on April, 1st 2016:

locky_pcap

The binary tries to contact:

  • 4 IPs hardcoded in the unpacked binary
  • then, if no connection could be made, 8 domains looking like a DGA

The DGA algorithm of Locky is initialized by a 4-byte seed stored in a small configuration inside the unpacked binary:

locky_conf

A quick analysis reveals the goal of each configuration element:

  • AffiliateId: botnet identifier used as the affid parameter in the GET request to retrieve the public key
  • DGASeed: DGA initializing seed, 0x4d in this sample
  • Sleep: number of seconds to wait before creating the HKCU\Software\Locky registry key (default: 0x1e = 30s)
  • Svchost: copy under the svchost.exe name (default: no)
  • Reg: persistance via the Run key of HKCU (default: no)
  • ExcludeRussian: exclusion of Russian systems (default: yes)
  • ServerIPs: list of 4 IPs to contact

The DGA is simple and can be broken down into 3 stages:

1) the algorithm is initialized with the above seed as well as the current year, month and day, via a few simple arithmetic operations. An incremental index is also passed to the function via the ECX register

locky_dga1

2) a loop generates one character at a time

locky_dga2

3) the extension is generated by using an index into a fixed TLD string

locky_dga3

The algorithm is very simple and can be implemented by hand in a few lines of Python (v32() is the truncation to 32 bits):

def dga(seed, index, year, month, day):
  c1 = 0xb11924e1
  c2 = 0x27100001
  c3 = 0x2709A354
  eax = v32(v32((year + 0x1bf5)) * c1)
  eax = v32(v32((ror(eax, 7) + seed + c2)) * c1)
  eax = v32(ror(eax, 7) + (day>>1) + c2)
  eax = v32(ror(eax*c1, 7) + month + c3)
  eax = v32(ror(eax*c1, 7) + rol(index, 0x15))
  eax = v32(eax + rol(seed, 0x11) + c2)
  eax = v32(ror(eax*c1, 7) + c2)
  out = ""
  for i in range(5 + eax%0xb):
    eax = v32(rol(eax, i) * c1)
    eax = v32(ror(eax, 7) + c2)
    out += chr(ord('a') + eax%0x19)
  eax = v32(ror(eax*c1, 7) + c2)
  eax = v32(2*(eax%0xe))
  out += "." + "rupweuinytpmusfrdeitbeuknltf"[eax:eax+2]
  return out

Let’s execute this function for the date 2016/04/01:

>>> for i in range(8):
...   print dga(0x4d, i, 2016, 04, 01)
...
puyyjnyqg.de
rvayxaf.be
mfedlavimoyjwhx.it
oyleagxyifobk.pw
xehxcvwmwwm.de
sgdgacta.nl
uhlbso.eu
pqpmoscxhbrabk.yt

The 8 domains generated for this day match those on the Wireshark capture. But what about the other days? One could obviously modify the date in Windows and launch Wireshark for each day…  What a boring task. A better way is to generate all domains for April and May and compare the result with an automatically generated list to validate our implementation.

$ cat dga.py
[...]
seed = 0x4d
begin = datetime.datetime(2016, 4, 1)
end   = datetime.datetime(2016, 5, 31)
domains_per_day = 8
date = begin
while date <= end:
  # Seed/date header
  print "0x%x %s" % (seed, date.strftime("%Y-%m-%d"))
 
  for i in range(domains_per_day):
    # Domain
    print dga(seed, i, date.year, date.month, date.day)
  date += datetime.timedelta(days=1)
  print ""
$ ./dga.py > dga_0x4d_method1_manual.txt
$ md5sum dga_0x4d_method1_manual.txt
116cd7fdb273cd47f60c59841b81c769 dga_0x4d_method1_manual.txt

Method #2 : traditional method with GDB/Python

Locky’s DGA is very simple and could be implemented rapidly, but if it had been more complex, it wouldn’t have been conceivable to do it entirely by hand. A method to address this issue is to instrument a virtual machine with GDB and Python, as shown in a previous post about Zeus P2P and Dyreza. Two breakpoints are to be used:

1. At the beginning of the function, for example just after the call to GetSystemTime(), so as to initialize the registers and the stack to the relevant state:

  • ESI contains the seed
  • EDI contains the index
  • a variable points to a SYSTEMTIME structure whose offsets 0, 2 and 6 contains the year, month and day, respectively

2. We let the piece of malware execute until it reaches a second breakpoint where the domain corresponding to these parameters has been generated. The index is then incremented and, by overwriting a few instructions, the program jumps back to the first breakpoint to generate another domain. If the index reaches 8, it is reset and the day is incremented.

The following GDB/Python script implements this method and automatically generates all domains for April and May 2016:

#! /usr/bin/env python
# Locky DGA generator based on GDB
# Sylvain SARMEJEANNE, CERT-LEXSI 04/2016
import gdb
import datetime
SEED = 0x4d
DATE_BEGIN = datetime.datetime(2016, 4, 1)
DATE_END = datetime.datetime(2016, 5, 31)
DOMAINS_PER_DAY = 8
OUTPUT_FILE = "/home/sylvain/malwares/locky/dga_0x%x_method2_gdb.txt" % SEED
BP_BEGIN = 0x406d67
BP_END = 0x406e95
SYSTEMTIME = 0x0012f7d8
pchar = gdb.lookup_type("char").pointer()
class LockyDGA(gdb.Breakpoint):
  def __init__(self, specs):
    for spec in specs:
      super(LockyDGA, self).__init__(spec, gdb.BP_BREAKPOINT)
    self.seed = SEED
    self.idx = 0
    self.datetime = DATE_BEGIN
 
    gdb.execute("set logging file %s" % OUTPUT_FILE)
    gdb.execute("set logging overwrite on")
    gdb.execute("set logging on")
    gdb.execute("set height 0")
  def stop(self):
    eip = long(gdb.parse_and_eval("$eip").cast(gdb.lookup_type("int").pointer())) & 0xffffffff
 
    if eip == BP_BEGIN:
      # Setting up the registers
      gdb.execute("set $esi=%i" % self.seed)
      gdb.execute("set $edi=%i" % self.idx)
 
      # Setting up the stack
      gdb.execute("set *(short*)0x%x=%i" % (SYSTEMTIME, self.datetime.year))
      gdb.execute("set *(short*)(0x%x+2)=%i" % (SYSTEMTIME, self.datetime.month))
      gdb.execute("set *(short*)(0x%x+6)=%i" % (SYSTEMTIME, self.datetime.day))
      if self.idx == 0:
        # Ouput the seed/date header
        gdb.write("0x%x %s\n" % (self.seed, self.datetime.strftime("%Y-%m-%d")), gdb.STDLOG)
      return False
    elif eip == BP_END:
      # Output the domain pointed to by edx
      gdb.write("%s\n" % gdb.Value(gdb.parse_and_eval("$edx")).cast(pchar).string(), gdb.STDLOG)
 
      self.idx += 1
 
      if self.idx == DOMAINS_PER_DAY:
        self.idx = 0
        self.datetime += datetime.timedelta(days=1)
 
        if self.datetime > DATE_END:
          gdb.execute("set logging off")
          return True
 
        gdb.write("\n", gdb.STDLOG)
      # Jump back to BP_BEGIN
      gdb.execute("set *0x406e95=0xfffecde9")
      gdb.execute("set *(0x406e95+4)=0xff")
      return False
LockyDGA(["*0x%x" % BP_BEGIN, "*0x%x" % BP_END])

The result is stored in dga_0x4d_method2_gdb.txt. As we are executing Locky itself in a complete virtual machine, these domains are considered as our reference. Let’s compare them with the result of our manual implementation (method #1):

$ md5sum dga_0x4d_method1_manual.txt dga_0x4d_method2_gdb.txt
116cd7fdb273cd47f60c59841b81c769 dga_0x4d_method1_manual.txt
116cd7fdb273cd47f60c59841b81c769 dga_0x4d_method2_gdb.txt

Our manual implementation seems correct.

Method #3: thin sandbox

The previous method works perfectly but for a simple malware like Locky, maybe we can find something easier to use. After all, why should we instrument a whole operating system when we only need to execute a few basic assembly instructions? Miasm provides a sandbox which quite addresses this.

Let’s begin by initializing the Miasm sandbox:

parser = Sandbox_Win_x86_32.parser(description="PE sandbox")
parser.add_argument("filename", help="Locky binary")
options = parser.parse_args()
sb = Sandbox_Win_x86_32(options.filename, options, globals())

We add a breakpoint where the domain is generated in memory:

def end(jitter):
  sb.jitter.run = False
  return True
sb.jitter.add_breakpoint(0x406e95, end)

The first bloc calls memset() to reset the zone where the domain is to be generated, let’s emulate this in Python with a second breakpoint:

def memsetzero(jitter):
  sb.jitter.vm.set_mem(domain, "\x00"*32)
  sb.jitter.pc = BP_CALL+5
  return True
sb.jitter.add_breakpoint(0x406de6, memsetzero)

Miasm has created a memory zone for the stack beginning at address 0x130000:

>>> sb.jitter.vm
Addr     Size    Access Comment
0x130000 0x10000 RW_    Stack

For example, we set up EBP to 0x13f000:

sb.jitter.cpu.EBP = 0x13f000

The domain will be generated at an arbitrary memory address and var_40 should point to it, as required by the DGA code:

domain = 0x900000
sb.jitter.vm.add_memory_page(domain, PAGE_READ | PAGE_WRITE, "\x00"*32)
sb.jitter.vm.set_mem(EBP-0x40, pack("<I", domain))

Finally, we set the seed, index and date (IDA indicates that the SYSTEMTIME structure pointer is off by 0x24):

sb.jitter.cpu.ESI = SEED
sb.jitter.cpu.EDI = idx
sb.jitter.vm.set_mem(EBP-0x24, pack("<H", 2016))
sb.jitter.vm.set_mem(EBP-0x24+2, pack("<H", 04))
sb.jitter.vm.set_mem(EBP-0x24+6, pack("<H", 01))

The execution will start from the address just after the call to GetSystemTime():

sb.run(0x406d67)

IDA indicates that the TLD string is at 0x413D5C; it has already been mapped by the Miasm PE parser as part of the unpacked binary .rdata section:

>>> sb.jitter.vm
Addr     Size   Access Comment
0x411000 0x7000 R__    'locky_unpack.exe': '.rdata\x00\x00'
>>> sb.jitter.vm.get_mem(0x413D5C, 28)
'rupweuinytpmusfrdeitbeuknltf'

If this mapping hadn’t been done automatically, it could have been added manually:

sb.jitter.set_str_ansi(0x413d5c, "rupweuinytpmusfrdeitbeuknltf")

The emulation correctly generates the first domain name for the date 2016/04/01:

$ python -i ./dga_miasm_sb.py locky_unpack.exe
>>> sb.jitter.vm.get_mem(0x900000, 12)
'puyyjnyqg.de'

Following almost the same layout as the GDB script from method #2, the following script generates the domain names for April and May 2016 with a Miasm sandbox:

#! /usr/bin/env python
# Locky DGA generator based on a thin Miasm sandbox
# Sylvain SARMEJEANNE, CERT-LEXSI 04/2016
from miasm2.analysis.sandbox import Sandbox_Win_x86_32
from miasm2.jitter.csts import PAGE_READ, PAGE_WRITE
from struct import pack
import datetime
SEED = 0x4d
DATE_BEGIN = datetime.datetime(2016, 4, 1)
DATE_END = datetime.datetime(2016, 5, 31)
DOMAINS_PER_DAY = 8
idx = 0
date = DATE_BEGIN
BP_BEGIN = 0x406d67
BP_CALL = 0x406de6
BP_END = 0x406e95
EBP = 0x13f000
SYSTEMTIME = EBP-0x24
domain = 0x900000
var_40 = EBP-0x40
var_2C = EBP-0x2C
def begin(jitter):
  # Setting up the registers
  sb.jitter.cpu.ESI = SEED
  sb.jitter.cpu.EDI = idx
  # Setting up the stack
  sb.jitter.vm.set_mem(SYSTEMTIME, pack("<H", date.year))
  sb.jitter.vm.set_mem(SYSTEMTIME+2, pack("<H", date.month))
  sb.jitter.vm.set_mem(SYSTEMTIME+6, pack("<H", date.day))
  if idx == 0:
    # Output the seed/date header
    print "0x%x %s" % (SEED, date.strftime("%Y-%m-%d"))
  return True
def memsetzero(jitter):
  sb.jitter.vm.set_mem(domain, "\x00"*32)
  sb.jitter.pc = BP_CALL+5
  return True
def end(jitter):
  global idx, date
  # Output the domain pointed to by edx
  print sb.jitter.get_str_ansi(sb.jitter.cpu.EDX)
 
  idx += 1
  if idx == DOMAINS_PER_DAY:
    idx = 0
    date += datetime.timedelta(days=1)
 
    if date > DATE_END:
      sb.jitter.run = False
      return True
    print ""
  # Jump back to BP_BEGIN
  sb.jitter.pc = BP_BEGIN
 
  return True
parser = Sandbox_Win_x86_32.parser(description="PE sandbox")
parser.add_argument("filename", help="Locky binary")
options = parser.parse_args()
sb = Sandbox_Win_x86_32(options.filename, options, globals())
sb.jitter.cpu.EBP = EBP
sb.jitter.vm.add_memory_page(domain, PAGE_READ | PAGE_WRITE, "\x00"*32)
sb.jitter.vm.set_mem(var_40, pack("<I", domain))
sb.jitter.vm.set_mem(var_2C, "\x11\x00\x00\x00")
sb.jitter.add_breakpoint(BP_BEGIN, begin)
sb.jitter.add_breakpoint(BP_CALL, memsetzero)
sb.jitter.add_breakpoint(BP_END, end)
sb.run(BP_BEGIN)

The domains generated this way match our reference:

$ ./dga_miasm_sb.py > dga_0x4d_method3_miasm_sb.txt 
$ md5sum dga_0x4d_method2_gdb.txt dga_0x4d_method3_miasm_sb.txt 
116cd7fdb273cd47f60c59841b81c769 dga_0x4d_method2_gdb.txt
116cd7fdb273cd47f60c59841b81c769 dga_0x4d_method3_miasm_sb.txt

Method #4: symbolic execution

4.1 DGA initialization

In the automated method #2 and #3, we simply executed Locky’s code without trying to understand its inner workings. When the goal is to reimplement the algorithm is C or Python for example, these methods are not relevant. If the DGA complexity makes the manual transcription (method #1) unfeasible, we need a tool to automatically generate a high-level version of the algorithm from its assembly instructions. Miasm has a symbolic execution engin for this purpose.

Let’s begin by disassembling the bloc initializing Locky’s DGA, between addresses 0x406d67 and 0x406dfa (see method #1 for the 3 stages of the algorithm):

c = Container.from_stream(open("/home/sylvain/malwares/locky/locky_unpack.exe"))
machine = Machine('x86_32')
mdis = machine.dis_engine(c.bin_stream)
mn = machine.mn
bloc_begin = 0x406d67
mdis.dont_dis = [0x406dfa]
blocs = mdis.dis_multibloc(bloc_begin)
ira = machine.ira()
for bloc in blocs:
  ira.add_bloc(bloc)

The execution is launched on this first bloc:

sb = symbexec(ira, mn.regs.regs_init)
sb.emul_ir_blocs(ira, bloc_begin)

In the code, we can see that several operations are performed on the EAX register and that the resulting value is stored in var_14 (EBP+0xFFFFFFEC). Let’s have a look at the memory areas altered by the symbolic execution:

>>> sb.dump_mem()
@32[(EBP_init+0xFFFFFFEC)] (((((ESI_init <<< 0x11)+((EDI_init&0x7) <<< 0x15)+(((((((((ESI_init+((({@16[(EBP_init+0xFFFFFFDC)],0,16, 0x0,16,32}+0x1BF5)*0xB11924E1) >>> 0x7)+0x27100001)*0xB11924E1) >>> 0x7)+{@16[(EBP_init+0xFFFFFFE2)][1:16],0,15, 0x0,15,32}+0x27100001)*0xB11924E1) >>> 0x7)+{@16[(EBP_init+0xFFFFFFDE)],0,16, 0x0,16,32}+0x2709A354)*0xB11924E1) >>> 0x7)+0x27100001)*0xB11924E1) >>> 0x7)+0x27100001)

The expression may look complex at first glance but we can still recognize additions and multiplications with the constants already seen for method #1, left (<<<) and right (>>>) rotations, as well as the use of several stack variables. Let’s use Miasm to simplify it. We can begin by replacing ESI and EDI registers reads by the seed and index:

sb.symbols[machine.mn.regs.ESI] = ExprId("seed")
sb.symbols[machine.mn.regs.EDI] = ExprId("index")

We can also replace reads of the SYSTEMTIME structure on the stack by more readable identifiers:

VAR_SYSTEMTIME = 0xFFFFFFDC
sb.symbols[ExprMem(ExprId("EBP_init") + ExprInt(VAR_SYSTEMTIME,   32), 16)] = ExprId("year", 16)
sb.symbols[ExprMem(ExprId("EBP_init") + ExprInt(VAR_SYSTEMTIME+2, 32), 16)] = ExprId("month", 16)
sb.symbols[ExprMem(ExprId("EBP_init") + ExprInt(VAR_SYSTEMTIME+6, 32), 16)] = ExprId("day", 16)

These are the same initialisations as those made with GDB and the thin sandbox for methods #2 and #3. After another execution, we obtain:

>>> sb.dump_mem()
@32[(EBP_init+0xFFFFFFEC)] (((((seed <<< 0x11)+((index&0x7) <<< 0x15)+(((((((((seed+((({year,0,16, 0x0,16,32}+0x1BF5)*0xB11924E1) >>> 0x7)+0x27100001)*0xB11924E1) >>> 0x7)+{day[1:16],0,15, 0x0,15,32}+0x27100001)*0xB11924E1) >>> 0x7)+{month,0,16, 0x0,16,32}+0x2709A354)*0xB11924E1) >>> 0x7)+0x27100001)*0xB11924E1) >>> 0x7)+0x27100001)

The expression can be simplified more by replacing constants by c1, c2 and c3 identifiers, by using the Miasm simplification engine:

def simplify_basic(expr):
  pattern1 = ExprInt(0xb11924e1, 32)
  replace1 = ExprId("c1")
  pattern2 = ExprInt(0x27100001, 32)
  replace2 = ExprId("c2")
  pattern3 = ExprInt(0x2709A354, 32)
  replace3 = ExprId("c3")
  simplifications = {
    pattern1:replace1,
    pattern2:replace2,
    pattern3:replace3,
  }
 
  return expr.replace_expr(simplifications)

As far as the year and month variables are concerned, they come from 16-bit fields inside a SYSTEMTIME structure and are zero-extended to 32 bits via the movzx instructions. In the Miasm intermediate language, this corresponds to an ExprCompose expression, noted with braces. To make it more readable, let’s simplify {year,0,16, 0x0,16,32} to year, and do the same for month:

pattern4 = ExprCompose([(ExprId("year", 16), 0, 16), (ExprInt(0, 16), 16, 32)])
replace4 = ExprId("year", 32)
pattern5 = ExprCompose([(ExprId("month", 16), 0, 16), (ExprInt(0, 16), 16, 32)])
replace5 = ExprId("month", 32)

For the day variable, Locky uses the shr ecx, 1 instruction, translated by Miasm into a 32-bit extended ecx[1:16] composition. Let’s simplify it to day >> 1 only:

pattern6 = ExprCompose([(ExprSlice(ExprId("day", 16), 1, 16), 0, 15), (ExprInt(0, 17), 15, 32)])
replace6 = ExprOp(">>", ExprId("day", 32), ExprInt(1, 32))

With these 6 basic simplifications, we obtain the following expression:

>>> var_14 = simplify_basic(sb.symbols.symbols_mem[ExprId("EBP_init", 32) + ExprInt(0xFFFFFFEC, 32)][1])
>>> "var_14 = %s" % var_14
'var_14 = (((((seed <<< 0x11)+((index&0x7) <<< 0x15)+(((((((((seed+(((year+0x1BF5)*c1) >>> 0x7)+c2)*c1) >>> 0x7)+(day >> 0x1)+c2)*c1) >>> 0x7)+month+c3)*c1) >>> 0x7)+c2)*c1) >>> 0x7)+c2)'

The scary expression has been converted into a perfectly readable one-liner.

In the altered memory zones, we also notice the reset of the var_10 variable:

>>> sb.dump_mem()
@32[(EBP_init+0xFFFFFFF0)] 0x0

The DGA loop terminates by comparing var_10 with EBX computed in this first bloc:

>>> "nb_loops = %s" % simplify_basic(sb.symbols[machine.mn.regs.EBX])
'nb_loops = (({(((((seed <<< 0x11)+((index&0x7) <<< 0x15)+(((((((((seed+(((year+0x1BF5)*c1) >>> 0x7)+c2)*c1) >>> 0x7)+(day >> 0x1)+c2)*c1) >>> 0x7)+month+c3)*c1) >>> 0x7)+c2)*c1) >>> 0x7)+c2),0,32, 0x0,32,64} umod 0xB)[0:32]+0x5)'

We recognize var_14, so let’s simplify the expression of var_14 by a var_14 identifier to make it more readable:

>>> ebx = simplify_basic(sb.symbols[machine.mn.regs.EBX])
>>> ebx = ebx.replace_expr({var_14:ExprId("var_14")})
>>> "nb_loops = %s" % ebx
'nb_loops = (({var_14,0,32, 0x0,32,64} umod 0xB)[0:32]+0x5)'

This expression represents the size of the generated domain names: 5 + var_14%0xB.

In the registers altered by this initial bloc, EDI is set to 0x10:

>>> sb.dump_id()
EDI 0x10

The first bloc can therefore be summed up like this:

var_14 = (((((seed <<< 0x11)+((index&0x7) <<< 0x15)+(((((((((seed+(((year+0x1BF5)*c1) >>> 0x7)+c2)*c1) >>> 0x7)+(day >> 0x1)+c2)*c1) >>> 0x7)+month+c3)*c1) >>> 0x7)+c2)*c1) >>> 0x7)+c2)
nb_loops = (({var_14,0,32, 0x0,32,64} umod 0xB)[0:32]+0x5)
var_10 = 0x0
edi = 0x10

4.2 DGA loop

The initialization bloc being understood, let’s continue with the DGA loop. Its symbolic execution doesn’t reach the end of the bloc:

>>> sb.symbols[machine.mn.regs.EIP]
ExprCond(<cond>, ExprInt(uint32(0x406e1bL)), ExprInt(uint32(0x406e1eL)))

The symbolic execution engine stops at the conditional jump based on a comparison between var_2C and EDI and can’t tell which branch to take. We know that EDI is set to 0x10 by the previous bloc; let’s force the variable to zero as the jump has no impact on the algorithm itself:

sb.symbols[ExprId('EDI', 32)] = ExprInt(0x10, 32)
sb.symbols[ExprMem(ExprId('EBP_init', 32) + ExprInt(0xffffffd4, 32))] = ExprInt(0, 32)

Let’s also set the variables seen in the previous bloc:

sb.symbols[ExprMem(ExprId("EBP_init", 32) + ExprInt(0xFFFFFFEC, 32))] = ExprId("var_14", 32)
sb.symbols[ExprMem(ExprId('EBP_init', 32) + ExprInt(0xfffffff0, 32))] = ExprId("var_10", 32)

This time, the execution reaches the end of the bloc and the following memory areas have been altered:

>>> sb.dump_mem()
@32[(EBP_init+0xFFFFFFF0)] (var_10+0x1)
@32[(EBP_init+0xFFFFFFEC)] ((((var_14 <<< ({var_10[0:8],0,8, 0x0,8,32}&0x1F))*0xB11924E1) >>> 0x7)+0x27100001)

var_10, set to zero in the previous bloc, in incremented by 1 in the loop and is therefore a counter. var_14 can be simplified as shown earlier:

>>> var_14 = simplify_basic(sb.symbols.symbols_mem[ExprId("EBP_init", 32) + ExprInt(0xffffffec, 32)][1])
>>> "var_14 = %s" % var_14
'var_14 = ((((var_14 <<< ({var_10[0:8],0,8, 0x0,8,32}&0x1F))*c1) >>> 0x7)+c2)'

In this expression, var_10 is zero-extended to 32 bits and can be simplified:

pattern7 = ExprCompose(((ExprSlice(ExprId('var_10', 32), 0, 8), 0, 8), (ExprInt(uint24(0x0L)), 8, 32)))
replace7 = ExprId('var_10', 32)

We obtain the following expression:

>>> "var_14 = %s" % var_14
'var_14 = ((((var_14 <<< (var_10&0x1F))*c1) >>> 0x7)+c2)'

The character generated by the DGA is present in EDX at the end of the loop:

>>> edx = simplify_basic(sb.symbols[machine.mn.regs.EDX])
>>> "char = %s" % edx
'char = {(({((((var_14 <<< ({var_10[0:8],0,8, 0x0,8,32}&0x1F))*c1) >>> 0x7)+c2),0,32, 0x0,32,64} umod 0x19)[0:8]+0x61),0,8, ({((((var_14 <<< ({var_10[0:8],0,8, 0x0,8,32}&0x1F))*c1) >>> 0x7)+c2),0,32, 0x0,32,64} umod 0x19)[8:32],8,32}'

We recognize the expression of var_14 which leads us to the following simplification:

>>> edx = edx.replace_expr({simplify_basic(var_14):ExprId("var_14")})
>>> "char = %s" % edx
'char = {(({var_14,0,32, 0x0,32,64} umod 0x19)[0:8]+0x61),0,8, ({var_14,0,32, 0x0,32,64} umod 0x19)[8:32],8,32}'

It is simply 0x61 + var_14%25, with 0x61 = ord(‘a’). This loop can therefore be summed up as:

var_14 = ((((var_14 <<< (var_10&0x1F))*c1) >>> 0x7)+c2)
char = {(({var_14,0,32, 0x0,32,64} umod 0x19)[0:8]+0x61),0,8, ({var_14,0,32, 0x0,32,64} umod 0x19)[8:32],8,32}
var_10 = (var_10+0x1)

4.3 Extension

This is the last bloc left. At 0x406e74, the EAX register contains the index into the TLD string:

>>> "eax = %s" % simplify_basic(sb.symbols[machine.mn.regs.EAX])
'eax = (({(((@32[(EBP_init+0xFFFFFFEC)]*c1) >>> 0x7)+c2),0,32, 0x0,32,64} umod 0xE)[0:32]*0x2)'

EBP_init+0xFFFFFFEC is var_14 and can be simplified:

sb.symbols[ExprMem(ExprId("EBP_init", 32) + ExprInt(0xFFFFFFEC, 32), 32)] = ExprId("var_14", 32)

We then obtain:

>>> "eax = %s" % simplify_basic(sb.symbols[machine.mn.regs.EAX])
'eax = (({(((var_14*c1) >>> 0x7)+c2),0,32, 0x0,32,64} umod 0xE)[0:32]*0x2)'

We have now generated all the required algorithm elements in the form of Miasm expressions:

# Initialization
var_14 = (((((seed <<< 0x11)+((index&0x7) <<< 0x15)+(((((((((seed+(((year+0x1BF5)*c1) >>> 0x7)+c2)*c1) >>> 0x7)+(day >> 0x1)+c2)*c1) >>> 0x7)+month+c3)*c1) >>> 0x7)+c2)*c1) >>> 0x7)+c2)

nb_loops = (({var_14,0,32, 0x0,32,64} umod 0xB)[0:32]+0x5)
var_10 = 0x0
# Loops of nb_loops iterations
var_14 = ((((var_14 <<< (var_10&0x1F))*c1) >>> 0x7)+c2)
char = {(({var_14,0,32, 0x0,32,64} umod 0x19)[0:8]+0x61),0,8, ({var_14,0,32, 0x0,32,64} umod 0x19)[8:32],8,32}
var_10 = (var_10+0x1)
# Extension
eax = (({(((var_14*c1) >>> 0x7)+c2),0,32, 0x0,32,64} umod 0xE)[0:32]*0x2)

4.4 C code generation

As the goal is to translate the algorithm in C and Python, we are going to convert these Miasm expressions via translators. For example on the var_14 initialisation expression:

>>> "var_14 = %s" % var_14
'var_14 = (((((seed <<< 0x11)+((index&0x7) <<< 0x15)+(((((((((seed+(((year+0x1BF5)*c1) >>> 0x7)+c2)*c1) >>> 0x7)+(day >> 0x1)+c2)*c1) >>> 0x7)+month+c3)*c1) >>> 0x7)+c2)*c1) >>> 0x7)+c2)'
>>> "var_14 = %s" % Translator.to_language('C').from_expr(var_14)
'var_14 = ((((rot_right(32, (((((((rot_left(32, seed, 0x11) &0xffffffff)&0xffffffff)+((rot_left(32, (((index&0xffffffff)&(0x7&0xffffffff))&0xffffffff), 0x15) &0xffffffff)&0xffffffff)+((rot_right(32, (((((((rot_right(32, (((((((rot_right(32, ((((((seed&0xffffffff)+((rot_right(32, ((((((year&0xffffffff)+(0x1bf5&0xffffffff))&0xffffffff)&0xffffffff)*(c1&0xffffffff))&0xffffffff), 0x7) &0xffffffff)&0xffffffff)+(c2&0xffffffff))&0xffffffff)&0xffffffff)*(c1&0xffffffff))&0xffffffff), 0x7) &0xffffffff)&0xffffffff)+(shift_right_logic_32(day , 0x1)&0xffffffff)+(c2&0xffffffff))&0xffffffff)&0xffffffff)*(c1&0xffffffff))&0xffffffff), 0x7) &0xffffffff)&0xffffffff)+(month&0xffffffff)+(c3&0xffffffff))&0xffffffff)&0xffffffff)*(c1&0xffffffff))&0xffffffff), 0x7) &0xffffffff)&0xffffffff)+(c2&0xffffffff))&0xffffffff)&0xffffffff)*(c1&0xffffffff))&0xffffffff), 0x7) &0xffffffff)&0xffffffff)+(c2&0xffffffff))&0xffffffff)'

For the iterations number:

>>> "nb_loops = %s" % ebx
'nb_loops = (({var_14,0,32, 0x0,32,64} umod 0xB)[0:32]+0x5)'
>>> "nb_loops = %s" % Translator.to_language('C').from_expr(ebx)
'nb_loops = (((((umod64((vm_cpu_t*)jitcpu->cpu, ((((uint64_t)(var_14 & 0xFFFFFFFF)) << 0) | (((uint64_t)(0x0 & 0xFFFFFFFF)) << 32)), 0xb)>>0) & 0xFFFFFFFF)&0xffffffff)+(0x5&0xffffffff))&0xffffffff)'

We notice that the C translation sometimes uses external functions like rot_left(), rot_right(), shift_right_logic_32() or umod64(). These functions are implemented in miasm2/jitter/vm_mngr.{c,h}. By copying the C code generated this way by Miasm from all our simplified expressions, we obtain the following code:

int dga(unsigned int seed, unsigned short index, unsigned short year, unsigned short month, unsigned short day) {
 
  unsigned char c;
  unsigned int eax;
  unsigned int c1 = 0xb11924e1;
  unsigned int c2 = 0x27100001;
  unsigned int c3 = 0x2709A354;
  char *tld = "rupweuinytpmusfrdeitbeuknltf";
  // Seed/date header
  if(index == 0)
    printf("0x%x %i-%02i-%02i\n", seed, year, month, day);
  // Initialisation
  unsigned int var_14 = ((((rot_right(32, (((((((rot_left(32, seed, 0x11) &0xffffffff)&0xffffffff)+((rot_left(32, (((index&0xffffffff)&(0x7&0xffffffff))&0xffffffff), 0x15) &0xffffffff)&0xffffffff)+((rot_right(32, (((((((rot_right(32, (((((((rot_right(32, ((((((seed&0xffffffff)+((rot_right(32, ((((((year&0xffffffff)+(0x1bf5&0xffffffff))&0xffffffff)&0xffffffff)*(c1&0xffffffff))&0xffffffff), 0x7) &0xffffffff)&0xffffffff)+(c2&0xffffffff))&0xffffffff)&0xffffffff)*(c1&0xffffffff))&0xffffffff), 0x7) &0xffffffff)&0xffffffff)+(shift_right_logic_32(day , 0x1)&0xffffffff)+(c2&0xffffffff))&0xffffffff)&0xffffffff)*(c1&0xffffffff))&0xffffffff), 0x7) &0xffffffff)&0xffffffff)+(month&0xffffffff)+(c3&0xffffffff))&0xffffffff)&0xffffffff)*(c1&0xffffffff))&0xffffffff), 0x7) &0xffffffff)&0xffffffff)+(c2&0xffffffff))&0xffffffff)&0xffffffff)*(c1&0xffffffff))&0xffffffff), 0x7) &0xffffffff)&0xffffffff)+(c2&0xffffffff))&0xffffffff);
  unsigned int nb_loops = (((((umod64(((((uint64_t)(var_14 & 0xFFFFFFFF)) << 0) | (((uint64_t)(0x0 & 0xFFFFFFFF)) << 32)), 0xb)>>0) & 0xFFFFFFFF)&0xffffffff)+(0x5&0xffffffff))&0xffffffff);
  unsigned int var_10 = 0;
  // Loop
  while(1) {
 
    var_14 = ((((rot_right(32, ((((rot_left(32, var_14, (((var_10&0xffffffff)&(0x1f&0xffffffff))&0xffffffff)) &0xffffffff)&0xffffffff)*(c1&0xffffffff))&0xffffffff), 0x7) &0xffffffff)&0xffffffff)+(c2&0xffffffff))&0xffffffff);
    
    var_10 = (var_10+0x1);
  
    c = ((((uint32_t)((((((umod64(((((uint64_t)(var_14 & 0xFFFFFFFF)) << 0) | (((uint64_t)(0x0 & 0xFFFFFFFF)) << 32)), 0x19)>>0) & 0xFF)&0xff)+(0x61&0xff))&0xff) & 0xFF)) << 0) | (((uint32_t)(((umod64(((((uint64_t)(var_14 & 0xFFFFFFFF)) << 0) | (((uint64_t)(0x0 & 0xFFFFFFFF)) << 32)), 0x19)>>8) & 0xFFFFFF) & 0xFFFFFF)) << 8));
 
    printf("%c", c);
 
    if(var_10 >= nb_loops)
      break;
  }
  // Extension
  eax = (((((umod64(((((uint64_t)(((((rot_right(32, (((var_14&0xffffffff)*(c1&0xffffffff))&0xffffffff), 0x7) &0xffffffff)&0xffffffff)+(c2&0xffffffff))&0xffffffff) & 0xFFFFFFFF)) << 0) | (((uint64_t)(0x0 & 0xFFFFFFFF)) << 32)), 0xe)>>0) & 0xFFFFFFFF)&0xffffffff)*(0x2&0xffffffff))&0xffffffff);

  printf(".%c%c\n", tld[eax], tld[eax+1]);
  return 0;
}

Let’s test it by generating the April and May 2016 domains:

int main() {
  int d, i;
  for(d=1;d<=30;d++) {
    for(i=0;i<8;i++)
      dga(0x4d, i, 2016, 04, d);
    printf("\n");
  }
  for(d=1;d<=31;d++) {
    for(i=0;i<8;i++)
      dga(0x4d, i, 2016, 05, d);
    printf("\n");
  }
 
  return 0;
}
$ gcc -W -Wall dga.c -o dga
$ ./dga > dga_0x4d_method4_miasm_symb_c.txt
$ md5sum dga_0x4d_method2_gdb.txt dga_0x4d_method4_miasm_symb_c.txt
116cd7fdb273cd47f60c59841b81c769 dga_0x4d_method2_gdb.txt
116cd7fdb273cd47f60c59841b81c769 dga_0x4d_method4_miasm_symb_c.txt

Miasm has correctly generated the algorithm and its C transcription is right.

4.5 Python code generation

Converting our expressions to Python is not as straightforward:

>>> "var_14 = %s" % Translator.to_language('Python').from_expr(var_14)
NotImplementedError: Unknown operator: >>>
>>> "nb_loops = %s" % Translator.to_language('Python').from_expr(ebx)
NotImplementedError: Unknown operator: umod

But as the textual representation of our expressions is very close to Python, we are going to generate valid Python code with the help of a few regular expression substitutions. The expression representing var_14 in the initial step is valid except for the left and right rotations:

>>> "var_14 = %s" % var_14
'var_14 = (((((seed <<< 0x11)+((index&0x7) <<< 0x15)+(((((((((seed+(((year+0x1BF5)*c1) >>> 0x7)+c2)*c1) >>> 0x7)+(day >> 0x1)+c2)*c1) >>> 0x7)+month+c3)*c1) >>> 0x7)+c2)*c1) >>> 0x7)+c2)'

Let’s replace the a <<< b operation by a rol(a,b) identifier and a >>> b by ror(a,b) with the following code:

def expr2python_op(e_s, e):
  jok1 = ExprId("jok1", 32)
  jok2 = ExprId("jok2", 32)
  # seed <<< 0x11 => rol(seed,0x11)
  to_match = ExprOp("<<<", jok1, jok2)
  result = MatchExpr(e, to_match, [jok1, jok2])
  if result:
    return ExprId("rol(%s,%s)" % (result[jok1], result[jok2]))
  # bla >>> 0x7 => ror(bla, 0x7)
  to_match = ExprOp(">>>", jok1, jok2)
  result = MatchExpr(e, to_match, [jok1, jok2])
  if result:
    return ExprId("ror(%s,%s)" % (result[jok1], result[jok2]))
  return e

By adding this function to the Miasm simplification engin, we obtain valid Python code for var_14:

>>> expr2python = {ExprOp:[expr2python_op]}
>>> expr_simp.enable_passes(expr2python)
>>> "var_14 = %s" % expr_simp(var_14.copy())
'var_14 = (c2+ror((c1*(c2+rol((index&0x7),0x15)+rol(seed,0x11)+ror((c1*(c3+month+ror((c1*(c2+ror((c1*(c2+ror((c1*(year+0x1BF5)),0x7)+seed)),0x7)+(day >> 0x1))),0x7))),0x7))),0x7))'

The same for the modulo computation:

>>> "nb_loops = %s" % ebx
'nb_loops = (({var_14,0,32, 0x0,32,64} umod 0xB)[0:32]+0x5)'
def expr2python_slice(e_s, e):
  jok1 = ExprId("jok1", 32)
  jok2 = ExprId("jok2", 64)
  to_match = ExprSlice(ExprOp('umod', ExprCompose(((jok1, 0, 32), (ExprInt(uint32(0x0L)), 32, 64))), jok2), 0, 32)
  result = MatchExpr(e, to_match, [jok1, jok2])
  if result:
    return ExprId("(%s%%%s)" % (result[jok1], result[jok2]))
  return e
>>> expr2python = {ExprOp:[expr2python_op], ExprSlice:[expr2python_slice]}
>>> expr_simp.enable_passes(expr2python)
>>> "nb_loops = %s" % expr_simp(ebx.copy())
'nb_loops = ((var_14%0xB)+0x5)'

This is also valid Python code and the other expressions can be converted the same way. The only step left is to add calls to v32() to guarantee 32-bit arithmetic:

def exprv32(e_s, e):
  jok1 = ExprId("jok1", 32)
  jok2 = ExprId("jok2", 32)
  jok3 = ExprId("jok3", 32)
  jok4 = ExprId("jok4", 32)
  # Apply v32 on each +
  to_match = ExprOp("+", jok1, jok2)
  result = MatchExpr(e, to_match, [jok1, jok2])
  if result:
    return ExprId("v32(%s+%s)" % (result[jok1], result[jok2]))
  # Apply v32 on each *
  to_match = ExprOp("*", jok1, jok2)
  result = MatchExpr(e, to_match, [jok1, jok2])
  if result:
    return ExprId("v32(%s*%s)" % (simplify_basic(result[jok1]), simplify_basic(result[jok2])))
  return e
>>> exprv32 = {ExprOp:[exprv32]}
>>> expr_simp.enable_passes(exprv32)

Finally, var_14 and nb_loops have the following Python expression:

>>> "var_14 = %s" % expr_simp(var_14.copy())
'var_14 = v32(c2+ror(v32(c1*v32(c2+rol((index&0x7),0x15)+rol(seed,0x11)+ror(v32(c1*v32(c3+month+ror(v32(c1*v32(c2+ror(v32(c1*v32(c2+ror(v32(c1*v32(year+0x1BF5)),0x7)+seed)),0x7)+(day >> 0x1))),0x7))),0x7))),0x7))'
>>> "nb_loops = %s" % expr_simp(ebx)
'nb_loops = v32((var_14%0xB)+0x5)'

We just have to copy/paste all these Python expressions to a script:

def dga(seed, index, year, month, day):
  var_14 = v32(c2+ror((c1*(c2+rol((index&0x7),0x15)+rol(seed,0x11)+ror((c1*(c3+month+ror((c1*(c2+ror((c1*(c2+ror((c1*(year+0x1BF5)),0x7)+seed)),0x7)+(day >> 0x1))),0x7))),0x7))),0x7))
  var_10 = 0x0
  nb_loops = v32((var_14%0xB)+0x5)
  out = ''
  while True:
    var_14 = v32(c2+ror((c1*rol(var_14,var_10)),0x7))
    out += chr(v32((var_14%0x19)+0x61))
    var_10 = v32(var_10+0x1)
    if var_10 >= nb_loops:
      break
  eax = v32((v32(c2+ror(v32(c1*var_14),0x7))%0xE)*0x2)
  out += '.' + "rupweuinytpmusfrdeitbeuknltf"[eax:eax+2]
  return out

Let’s test it for April and May 2016:

seed = 0x4d
begin = datetime.datetime(2016, 4, 1)
end = datetime.datetime(2016, 5, 31)
domains_per_day = 8
date = begin
while date <= end:
  # Seed/date header
  print "0x%x %s" % (seed, date.strftime("%Y-%m-%d"))
  for i in range(domains_per_day):
    # Domain
    print dga(seed, i, date.year, date.month, date.day)
  date += datetime.timedelta(days=1)
  print ""
$ ./dga_miasm_symb.py > dga_0x4d_method4_miasm_symb.txt
$ md5sum dga_0x4d_method2_gdb.txt dga_0x4d_method4_miasm_symb.txt
116cd7fdb273cd47f60c59841b81c769 dga_0x4d_method2_gdb.txt
116cd7fdb273cd47f60c59841b81c769 dga_0x4d_method4_miasm_symb.txt

Perfect!

Conclusion

DGAs used by current malware are generally simple and can be easily translated into any language by a human. But should the algorithm be more complex than excepted, or updated frequently, it can be useful to automate this task. Via a classical virtual machine, GDB can do the job but if one is looking for a lightweight solution, the Miasm framework comes in handy. Its sandbox can emulate code from any binary and quickly generate the domain names, without the need to understand all the DGA implementation details.

On the contrary, if the goal is to rebuild the algorithm to implement it in another language, the Miasm symbolic execution module can generate expressions in an intermediate language, corresponding to modified registers or memory areas, which can then be reduced with its simplification engine. Translating them in C or Python can then be performed directly with the Translator module or via regular expressions, so as to obtain the algorithm itself instead of the result of its execution only.