windowsQEMU implements a GDB server making it possible to attach to the operating system from outside the virtual machine, via the target remote command of the debugger. When analysing a Windows malware, this method is useful to bypass any anti-debug technique but has a big drawback: GDB has no knowledge of the underlying system and therefore cannot display any symbol to ease the analysis. As an example, let’s see how to add information from the import table.

This is an example of the code located at an unpacked malware sample original entry point, as seen by GDB. The DWORD PTR ds:<addr> operands pointing to imported Windows APIs are not recognized (although they are in IDA, for example):

gdb$ x/20i $eip
=> 0x4334d5 push ebp
   0x4334d6 mov ebp,esp
   0x4334d8 sub esp,0xc
   0x4334db push ebx
   0x4334dc push esi
   0x4334dd mov esi,DWORD PTR ds:0x401278 
   [...]
   0x4334fc call DWORD PTR ds:0x401170
   0x433502 mov DWORD PTR [ebp-0x4],eax
   0x433505 cmp eax,ebx
   0x433507 je 0x433579
   [...]
   0x424617 call DWORD PTR ds:0x4011b4 
   0x42461d lea eax,[ebp-0x8]
   0x424620 push eax
   0x424621 call DWORD PTR ds:0x4012f4
   0x424627 push eax
   0x424628 call DWORD PTR ds:0x40133c

GDB supports the COFF binary format which the Windows PE format derives from, and even implements PE-specific functions. If GDB is passed a PE for symbol retrieval, the following result occurs:

gdb$ symbol-file malware_depack.exe
Reading symbols from malware_depack.exe...(no debugging symbols found)...done.

gdb$ maint print msymbols malware_depack_symbols.txt

$ cat malware_depack_symbols.txt
Object file malware_depack.exe:
No minimal symbols found.

In the coff_symtab_read() function (gdb/coffread.c) which creates a COFF symbol description structure, the export table is parsed:

if ((nsyms == 0) && (pe_file))
  {
    /* We've got no debugging symbols, but it's a portable
       executable, so try to read the export table. */
    read_pe_exported_syms (objfile);
 }

But the import table is not analysed. GDB has therefore to be patched so as to create symbols from this table.

Reminder on PE imports

The relative virtual address (RVA) of the import table is located in the 2nd entry of the IMAGE_DATA_DIRECTORY array of the PE optional header. It points to a sequence of IMAGE_IMPORT_DESCRIPTOR structures, each of them describing the imports related to a single DLL:

typedef struct _IMAGE_IMPORT_DESCRIPTOR {
    _ANONYMOUS_UNION union {
    ULONG Characteristics;
    ULONG OriginalFirstThunk;
    } DUMMYUNIONNAME;
    ULONG TimeDateStamp;
    ULONG ForwarderChain;
    ULONG Name; // DLL name (ex: "KERNEL32.dll")
    ULONG FirstThunk;
} IMAGE_IMPORT_DESCRIPTOR, *PIMAGE_IMPORT_DESCRIPTOR;

The FirstThunk field contains the RVA of the first function imported from this DLL. The OriginalFirstThunk field points to an IMAGE_THUNK_DATA structure which is a 4-byte union whose meaning varies:

  • if the most significant bit of these 4 bytes is 0, they have to be interpreted as a relative address to an IMAGE_IMPORT_BY_NAME structure where the name of the function is recorded at offset 2
  • otherwise, the other bits are the ordinal of the function as exported by the DLL. Its name is not available in the binary and a table is needed to display it properly

By browsing these two fields simultaneously, the name of each imported function and its virtual address can be retrieved for each DLL.

GDB minimal symbols

Implementing this in GDB is performed using the BFD (Binary File Descriptor) API to access binary content at the necessary offsets. The prim_record_minimal_symbol_and_info() function can be used to add a minimal symbol, namely a symbol which is not related to any debugging information (virtual address and name in our case). A start symbol for the entry point can also be added. Here is the result:

gdb$ symbol-file malware_depack.exe
Reading symbols from malware_depack.exe...
KERNEL32.dll
0x401104 GetTempPathW
0x401108 GetFileSizeEx
0x40110c OpenMutexW
[...]
WS2_32.dll
0x4015e4 WSAResetEvent
0x4015e8 closesocket [ordinal 3]
0x4015ec WSACreateEvent
[...]
(no debugging symbols found)...done.

gdb$ x/20i $eip
=> 0x4334d5 <start>:   push ebp
   0x4334d6 <start+1>: mov ebp,esp
   0x4334d8 <start+3>: sub esp,0xc
   0x4334db <start+6>: push ebx
   0x4334dc <start+7>: push esi
   0x4334dd <start+8>: mov esi,DWORD PTR ds:0x401278
   [...]
   0x4334f0 <start+27>: mov BYTE PTR ds:0x441510,bl
   0x4334f6 <start+33>: mov DWORD PTR ds:0x441514,ebx
   0x4334fc <start+39>: call DWORD PTR ds:0x401170
   0x433502 <start+45>: mov DWORD PTR [ebp-0x4],eax
   0x433505 <start+48>: cmp eax,ebx
   0x433507 <start+50>: je 0x433579 <start+164>

The result is somewhat mixed inasmuch as only the start symbol appears. The call to the function pointed to by 0x401170 is not recognized although GDB knows the corresponding symbol:

gdb$ x/x 0x401170
0x401170 <CreateEventA>: 0x7c81e4bd

After some wandering in the GDB source code maze, we stumble upon the print_insn() function (opcodes/i386-dis.c) responsible for displaying the result of the disassembly to the user, containing the following code:

for (i = 0; i < MAX_OPERANDS; ++i)
  if (*op_txt[i])
    {
      if (needcomma)
        (*info->fprintf_func) (info->stream, ",");

      // goes here for je 0x433579 <start+164>
      if (op_index[i] != -1 && !op_riprel[i])
        (*info->print_address_func) ((bfd_vma) op_address[op_index[i]], info);

      // goes here for call DWORD PTR ds:0x401170
      else
        (*info->fprintf_func) (info->stream, "%s", op_txt[i]);
    }
  needcomma = 1;
}

There are two different branches depending of the operand being:

  • an address (ex: conditional jump to 0x433579): the function pointer print_address_func() calls print_address_symbolic() (gdb/printcmd.c) which looks for a symbol corresponding to the address, and displays it if it exists
  • or a string (ex: DWORD PTR ds:0x401170): the string is displayed as is

By precisely tracing how this string is generated, we stumble upon the OP_E_memory() function (opcodes/i386-dis.c), containing this code where we can see GDB displaying the segment name followed by the “:” character and the value:

if (intel_syntax)
  {
    if (modrm.mod != 0 || base == 5) // base = modrm.rm
      {
        if (!active_seg_prefix)
          {
            oappend (names_seg[ds_reg - es_reg]);
            oappend (":");
          }
        print_operand_value (scratchbuf, 1, disp);
        oappend (scratchbuf);
      }
  }

scratchbuf directly receives the disp value created a few lines above:

switch (modrm.mod)
  {
    case 0:
      if (base == 5) // base = modrm.rm
        {
          havebase = 0;
          if (address_mode == mode_64bit && !havesib)
            riprel = 1;
          disp = get32s ();
}

The call DWORD PTR ds:0x401170 instruction translates in hex form into ff 15 70 11 40 00: opcode 0xff and modrm 0x15=0b00 010 101, namely mod=0b00, reg=0b010=2 and rm=0b101=5. In the code above, disp therefore receives the numerical value of the address via the get32s() function.

As GDB considers it as a displacement inside a segment, it doesn’t bother checking it for a symbol. After adding this feature by calling print_address_func() on the displacement numerical value, the result is now:

gdb$ x/20i $eip
=> 0x4334d5 <start>:   push ebp
   0x4334d6 <start+1>: mov ebp,esp
   0x4334d8 <start+3>: sub esp,0xc
   0x4334db <start+6>: push ebx
   0x4334dc <start+7>: push esi
   0x4334dd <start+8>: mov esi,DWORD PTR ds:0x401278 <CreateThread>
   [...]
   0x4334fc <start+39>: call DWORD PTR ds:0x401170 <CreateEventA>
   0x433502 <start+45>: mov DWORD PTR [ebp-0x4],eax
   0x433505 <start+48>: cmp eax,ebx
   0x433507 <start+50>: je 0x433579 <start+164>
   [...]
   0x424617 call DWORD PTR ds:0x4011b4 <SetErrorMode>
   0x42461d lea eax,[ebp-0x8]
   0x424620 push eax
   0x424621 call DWORD PTR ds:0x4012f4 <GetCommandLineW>
   0x424627 push eax
   0x424628 call DWORD PTR ds:0x40133c <CommandLineToArgvW>

A small tip , from Lexsi specialists, to enhance the readability of Windows code under GDB 🙂