Leveraging from PE parsing technique to write x86 shellcode

1. Introduction

Shellcode is often used alongside an exploit to subvert a running program, or by an injector performing a process injection. Hence, shellcode must dynamically locate the required WIN32 API functions to work reliably and efficiently in different Windows versions, and for that task, it typically uses LoadLibraryA and GetProcAddress that are exported from "kernel32.dll".

In this post, we will explore the world of Win32 shellcode development using what we learned from the previous blog regarding PEB structure, and specifically, we will understand how the shellcode leverage from PE parsing technique.

Thus, everything will be done directly within the debugger via IDA Pro, as well as we will easily get the opcodes and test our shellcode step by step. Finally, to truly understand the structure of Kernel32.dll, we will use CFF Explorer and view the contents of this precious DLL. Now, let's fasten our seat belts and start!

2. Finding Kernel32 Base Address

In our previous post "Digging into Windows PEB", we conclude that any executable file is being loaded in the memory, the Windows loads beside it the main core libraries kernel32.dll & ntdll.dll and saves the addresses of these libraries in the base address. The figure below describes the data structures that are followed to find the base address of kernel32.dll:

So, we will retrieve the base address of kernel32.dll from the PEB as shown in the following sample assembly code:

BITS 32

global _start

section .text

_start:
	xor eax, eax              ; Avoid Null Bytes
	mov eax, [fs:eax + 0x30]	; EAX = PEB
	mov eax, [eax + 0xc]			; EAX = PEB->Ldr
	mov esi, [eax + 0x14]			; ESI = PEB->Ldr.InMemoryOrderModuleList
	lodsd											; EAX = Second module (ntdll.dll)
	xchg eax, esi							; move to next element
	lodsd											; EAX = Third(kernel32.dll)
	mov eax, [eax + 0x10]			; EAX = Base address

To understand this sample of assembly code just take a look at my previous post.

With this assembly code, we can find the kernel32.dll base address and store it in eax register, thus we need to assemble it via nasm. As the program is written in x86 assembly, the elf32 file type is specified using the-f flag then disassembled into opcodes using objdump :

Now, let's test our shellcode within the context of a C program, the shellcode can be placed in a test program (titled runner.c in this example) written in C, as shown below:

#include <windows.h>

const char main[] = "\x31\xc0\x64\x8b\x40\x30\x8b\x40\x0c\x8b\x70\x14\xad\x96\xad\x8b\x40\x10";

This program should be compiled and executed in IDA PRO for debugging purposes:

Now, eax register points to a memory address 0x75720000, which indicates that we got the base address of the kernel32.dll successfully. We can substantiate this result with the fact that we're pointing into e_magic which is a member of MS-DOS header of kernel32.dll :

The first field, e_magic, is called also the magic number. This field is used to identify an MS-DOS-compatible file type. All MS-DOS-compatible executable files set this value to 0x5A4D, which represents the ASCII characters MZ. At this level, we retrieve the address of memory where kernel32.dll is loaded!

3. Finding the export table of Kernel32.dll

Before diving into this part, I would like to highlight some mandatory definitions :

Relative Virtual Address(RVA): In an image file, this is the address of an item after it is loaded into memory, with the base address of the image file subtracted from it.The RVA of an item almost always differs from its position within the file on disk (file pointer). --> RVA = VA - BaseAddress
Virtual Address (VA):Same as RVA, except that the base address of the image file is not subtracted. The address is called a VA because Windows creates a distinct VA space for each process, independent of physical memory. For almost all purposes, a VA should be considered just an address. A VA is not as predictable as an RVA because the loader might not load the image at its preferred location. --> VA = RVA + BaseAddress

We found the base address of kernel32.dll in memory. Now we need to parse this PE file and find the export directory:

  mov ebx, [eax + 0x3c] ; RVA of PE signature
	add ebx, eax			    ; VA of PE signature
	mov ebx, [ebx + 0x78]	; RVA of the exported directory
	add ebx, eax			    ; VA of the exported directory
	mov esi, [ebx + 0x20] ; RVA of the exported function names table
	add esi, eax				  ; VA of the exported function names table

e_lfanew is a 4-byte offset into the file where the PE file header is located. It is necessary to use this offset to locate the PE header in the file.

(Lines 1-2) We know that we can find the “e_lfanew” pointer at the offset 0x3C:

After this operation mov ebx, [eax + 0x3c], the ebx should hold the value F8, as depicted in the following figure:

Now, we can find the address of PE signature by adding kernel32 base address and the PE signature RVA: 0x75720000 + F8 = 0x757200F8 and we find the PE signature there:

As you know the PE header is a structure that contains the following information:

typedef struct _IMAGE_NT_HEADERS {
  DWORD                   Signature;
  IMAGE_FILE_HEADER       FileHeader;
  IMAGE_OPTIONAL_HEADER32 OptionalHeader;
} IMAGE_NT_HEADERS32, *PIMAGE_NT_HEADERS32;

Signature member identifying the file as a PE image. The bytes are 0x4550(we could notice the value presented is 50 45 the reason is little-endian) which represents the ASCII characters "PE" as you can see above in our debugging process.

(Lines 3-4) The IMAGE_OPTIONAL_HEADER is a structure containing more useful information for us:

typedef struct _IMAGE_OPTIONAL_HEADER {
  WORD                 Magic;
  BYTE                 MajorLinkerVersion;
  BYTE                 MinorLinkerVersion;
  DWORD                SizeOfCode;
  DWORD                SizeOfInitializedData;
....
  IMAGE_DATA_DIRECTORY DataDirectory[IMAGE_NUMBEROF_DIRECTORY_ENTRIES];
} IMAGE_OPTIONAL_HEADER32, *PIMAGE_OPTIONAL_HEADER32;

It contains our main member which is DataDirectory that contains information such as imported and exported functions.

At the offset 0x78 of the PE header, we can find the RVA of Export Directory:

most of you will ask how we get this offset very simple:

or sizeof(PE_Signature) + sizeof(IMAGE_FILE_HEADER) + offsetof(IMAGE_OPTIONAL_HEADER,DataDirectory) = 120 bytes (78 in hex)

Again, we add this value to the eax register and we are now placed on the export directory of the kernel32.dll.

The export directory is the following structure:

typedef struct _IMAGE_EXPORT_DIRECTORY {
     DWORD   Characteristics;
     DWORD   TimeDateStamp;
     WORD    MajorVersion;
     WORD    MinorVersion;
     DWORD   Name;
     DWORD   Base;
     DWORD   NumberOfFunctions;
     DWORD   NumberOfNames;
     DWORD   AddressOfFunctions;     
     DWORD   AddressOfNames;        
     DWORD   AddressOfNameOrdinals;  
 };

The relevant fields in the _IMAGE_EXPORT_DIRECTORY:

AddressOfFunctions is an array of RVAs that points to the actual export functions. It is indexed by an export ordinal. The shellcode needs to map the export name to the ordinalto use this array.

This mapping is done via AddressOfNames and AddressOfNameOrdinals arrays. These two arrays exist in parallel. They have the same number of entries, and equivalent indices into these arrays are directly related.

AddressOfNames is an array of 32-bit RVAs that point to the strings of symbol names.
AddressOfNameOrdinals is an array of 16-bit ordinals. For a given index id into these arrays, the symbol at AddressOfNames[id] has the export ordinal value at AddressOfNameOrdinals[id].

(Lines 5-6) In the IMAGE_EXPORT_DIRECTORY structure, at the offset 0x20, contains an RVA of the exported function names table which is 0x000945B4:

Again :p most of you will ask how we get this offset very simple:

Let's retrieve the address of exported function names table by adding the Name Pointer Table RVA 0x000945B4 with kernel32 base address 0x75720000, which results in 0x757B45B4 that store the name of an RVA of the first exported function 0x00096BCA:

4. Find LoadLibrayA using Hashed export Names

It’s not always a good idea to use ASCII strings, an UNICODE string since it will just make our shellcode bigger! and also easy to spot. So it would be better to use a hash value to look up our targeted WIN32 API functions.

For that reason, we used the C program from StackOverflow that resolves all exported WIN32 API functions that exist in kernel32.dll (we're doing the same via assembly version), and in every callback, we will generate a unique hash for the corresponding exported function via the following code snippet:

#include <Windows.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>


void EnumExportedFunctions (char *, void (*callback)(char*));
int Rva2Offset (unsigned int);

typedef struct {
    unsigned char Name[8];
    unsigned int VirtualSize;
    unsigned int VirtualAddress;
    unsigned int SizeOfRawData;
    unsigned int PointerToRawData;
    unsigned int PointerToRelocations;
    unsigned int PointerToLineNumbers;
    unsigned short NumberOfRelocations;
    unsigned short NumberOfLineNumbers;
    unsigned int Characteristics;
} sectionHeader;

sectionHeader *sections;
unsigned int NumberOfSections = 0;

int Rva2Offset (unsigned int rva) {
    int i = 0;

    for (i = 0; i < NumberOfSections; i++) {
        unsigned int x = sections[i].VirtualAddress + sections[i].SizeOfRawData;

        if (x >= rva) {
            return sections[i].PointerToRawData + (rva + sections[i].SizeOfRawData) - x;
        }
    }

    return -1;
}

void EnumExportedFunctions (char *szFilename, void (*callback)(char*)) {
    FILE *hFile = fopen (szFilename, "rb");

    if (hFile != NULL) {
        if (fgetc (hFile) == 'M' && fgetc (hFile) == 'Z') {
            unsigned int e_lfanew = 0;
            unsigned int NumberOfRvaAndSizes = 0;
            unsigned int ExportVirtualAddress = 0;
            unsigned int ExportSize = 0;
            int i = 0;

            fseek (hFile, 0x3C, SEEK_SET);
            fread (&e_lfanew, 4, 1, hFile);
            fseek (hFile, e_lfanew + 6, SEEK_SET);
            fread (&NumberOfSections, 2, 1, hFile);
            fseek (hFile, 108, SEEK_CUR);
            fread (&NumberOfRvaAndSizes, 4, 1, hFile);

            if (NumberOfRvaAndSizes == 16) {
                fread (&ExportVirtualAddress, 4, 1, hFile);
                fread (&ExportSize, 4, 1, hFile);

                if (ExportVirtualAddress > 0 && ExportSize > 0) {
                    fseek (hFile, 120, SEEK_CUR);

                    if (NumberOfSections > 0) {
                        sections = (sectionHeader *) malloc (NumberOfSections * sizeof (sectionHeader));

                        for (i = 0; i < NumberOfSections; i++) {
                            fread (sections[i].Name, 8, 1, hFile);
                            fread (&sections[i].VirtualSize, 4, 1, hFile);
                            fread (&sections[i].VirtualAddress, 4, 1, hFile);
                            fread (&sections[i].SizeOfRawData, 4, 1, hFile);
                            fread (&sections[i].PointerToRawData, 4, 1, hFile);
                            fread (&sections[i].PointerToRelocations, 4, 1, hFile);
                            fread (&sections[i].PointerToLineNumbers, 4, 1, hFile);
                            fread (&sections[i].NumberOfRelocations, 2, 1, hFile);
                            fread (&sections[i].NumberOfLineNumbers, 2, 1, hFile);
                            fread (&sections[i].Characteristics, 4, 1, hFile);
                        }

                        unsigned int NumberOfNames = 0;
                        unsigned int AddressOfNames = 0;

                        int offset = Rva2Offset (ExportVirtualAddress);
                        fseek (hFile, offset + 24, SEEK_SET);
                        fread (&NumberOfNames, 4, 1, hFile);

                        fseek (hFile, 4, SEEK_CUR);
                        fread (&AddressOfNames, 4, 1, hFile);

                        unsigned int namesOffset = Rva2Offset (AddressOfNames), pos = 0;
                        fseek (hFile, namesOffset, SEEK_SET);

                        for (i = 0; i < NumberOfNames; i++) {
                            unsigned int y = 0;
                            fread (&y, 4, 1, hFile);
                            pos = ftell (hFile);
                            fseek (hFile, Rva2Offset (y), SEEK_SET);

                            char c = fgetc (hFile);
                            int szNameLen = 0;

                            while (c != '\0') {
                                c = fgetc (hFile);
                                szNameLen++;
                            }

                            fseek (hFile, (-szNameLen)-1, SEEK_CUR);
                            char* szName = calloc (szNameLen + 1, 1);
                            fread (szName, szNameLen, 1, hFile);

                            callback (szName);

                            fseek (hFile, pos, SEEK_SET);
                        }
                    }
                }
            }
        }

        fclose (hFile);
    }
}

void calculate_hash(char* szName) {
    DWORD hash = 0;
    DWORD i = 0;
    for(i; i < strlen(szName); i++) {
        hash <<= 1;
        hash += szName[i];
    }
    printf("%s:0x%08x\n", szName, hash);

}

int main (int argc, char **argv) {
    printf("Loading %s\n", argv[1]);
    EnumExportedFunctions(argv[1], calculate_hash);
    return 0;
}

If you notice above the calculate_hash function basically it's a "loop for" that simply shifts left by 1 the value existing in hash variable then add it to szName[i] which hold an exported function name.

Generate all hashes of exported functions that actually exist in kernel32.dll:

.\dll_hash_calculator.exe C:\Windows\SysWOW64\kernel32.dll > hashes.txt

as result:

let's first inspect our first exported function in memory using the following assembly code :

	push 0x00059ba3
	xor ecx, ecx			  ;prepare counter
	mov edx, eax			  ;save eax into edx
	call _find_addr			

_find_addr:
	inc ecx							;increment name index counter
	lodsd								;load name rva into eax and increment esi by 4 to next RVA
	add eax,edx         ;add kernel32 base address to get VA of function name

(Line 1) we push the precomputed hash value of LoadLibraryA on the stack since we will use it after to find our targeted function.

(Line 2) we set ecx register to 0 for mapping the export name of the targeted WIN32 API function(LoadLibraryA) with his ordinal to retrieve his address in AddressOfFunctions array.

(Line 3) we save eax register that actually holds the base address of kernel32.dll into edx, because after we will use lodsd that will overwrite our eax register.

Loads a byte, word, or doubleword from the source operand into the AL, AX, or EAX register, respectively.

(Line 6-9) we create a procedure called "_find_addr" which is presented via a label in our asm code and we will use lodsd that will take esi register the pointer to the first function name. The lodsd instruction will place in eax the offset to the function name ( “AcquireSRWLockExclusive”) and we add this with the edx (kernel32 base address) to find the correct pointer. Note that the lodsd instruction will also increment the esi register value with 4! This helps us because we do not have to increment it manually, we just need to call again lodsd to get the next function name pointer:

Remember we're incrementing ecx register, which will be the counter of our functions and the function ordinal number.

Next step, we need to calculate the hash of every exported function name as we did in the C language version:

find_addr:
	inc ecx
	lodsd
	add eax,edx
	call _calculate_hash
	
_calculate_hash:
	push ecx
	push edx

	xor ecx, ecx
	mov edi, ecx
	mov edx, edi
	
	_loop:
		shl edi,1			; 
		mov dl, BYTE [eax + ecx]
		add edi, edx
		inc ecx
		cmp BYTE[eax + ecx], 0
		jne _loop

	pop edx
	pop ecx
	ret

(Line 7-13) we saved all values set on ecx and edx on the stack since we will need them after, then we cleared respectively ecx, edi, and edx. For clarity, we will not clear eax since now it points into the first BYTE of the first exported function:

(Line 15-21) Mainly the _loop function it's an asm representation of C language hash function mentioned previously, the instructionshl edi,1shift by left the value stored in edi. We stored the first BYTE of the exported function name in dl(8 bits version of edx) and of course, ecx is used in this case to ensure that we're keeping tracking every BYTE, this is done viamov dl, BYTE [eax + ecx]then we should add edx to edi. However, we need to confirm that we reached the end of the exported function name and this is done via the following instructioncmp [edi + ecx], 0since every exported function name it's a null-terminated string and finally, we keep looping till the ZF is set to 1:

Now, edi hold the hash of the first exported function name 2A992F1D, we can confirm that by grepping into our hash.txt generated previously:

(Line 23-25)Finally, we need to restore all registers values that we pushed on that stack and get back to our function fin_addr via ret instruction.

Now, we need to compare if the value stored in edi match the hash of LoadLibraryA which already pushed on the stack:

_find_addr:
	inc ecx
	lodsd
	add eax,edx
	call _calculate_hash
	cmp edi, [esp + 4]
	jnz _find_addr
	ret

(Line 1- 5 ) Already explained previously.

(Line 6-8) You should put in mind that whenever the function _calculate_hash is called, it will return a hash value of an exported function name in edi register, and to make sure that that we find LoadLibraryA hash function we need to set this instruction : cmp edi, [esp + 4 ] and keep looping till ZF is set to 1. For debugging purposes, we will set a BP at ret instruction:

At that point we're basically reaching our goal which is finding the hash value of LoadLibraryA:

eax point at the beginning of LoadLibraryA.
edi holds the hash value of LoadLibrary 0x00059ba3.
The most precious value for us to retrieve the address of LoadLibraryA is:
- ecx = 0x000003C6 which is the function ordinal number.

5.Find the address of LoadLibraryA function

At this point, we only found the ordinal number of the LoadLibrayA function, but we can use it to find the actual address of this function:

call _get_addr
push edi

_get_addr:
	mov esi, [ebx + 0x24]     		; RVA of function ordinal table
	add esi, edx			  					; VA of function ordinal table
	mov cx, WORD[esi + ecx * 2]	  ; get LoadLibray biased_ordinal
	dec ecx												; get LoadLibray ordinal
	mov esi, [ebx + 0x1c]					; RVA of AddressOfFunctions = The Export Address Table
	add esi, edx									; VA of the Exported Table
	mov edi, [esi + ecx * 4]			; RVA of LoadLibrayA
	add edi, edx									; VA of LoadLibrayA
	ret

(Line 4-5) At this point, we have in ebx a pointer to the IMAGE_EXPORT_DIRECTORY structure. At the offset 0x24 of the structure, we can find the “AddressOfNameOrdinals” offset. In line 5, we add this offset to edx register which is the base address of the kernel32.dll so we get a valid pointer to the name ordinals table. Some of you may ask the logic behind 0x24:

(Lines 6-7) The esi register contains the pointer to the name ordinals array.

The name ordinals array (export ordinal table) is an array of 16-bit unbiased indexes into the export address table. Ordinals are biased by the Ordinal Base field of the export directory table. In other words, the ordinal base must be subtracted from the ordinals to obtain true indexes into the export address table.

This array contains two-byte numbers. Up to now, we have the biased_ordinal of LoadLibraryA function in the ecx register, so this way we get the function address ordinal (index). This will help us to get the function address.

May one of you get confused regarding this instruction mov cx, [esi + ecx * 2].In fact, we want the value of the ecx=59ba3 element of the name ordinals array of type T: you do [arraystart + (ecx*sizeof(T))] --> [esi + ecx * 2] and the ordinal array it stores ordinals in 2 bytes=T, and finally we stored this value in 2 bytes version of ecx which is cx.

We have to subtract biased_ordinal from OrdinalBase to get the ordinal number of our function:

ordinal = biased_ordinal - OrdinalBase; //represented by the instruction dec ecx

Since in our case OrdinalBase equal to 1:

Until now, we have the ordinal number stored in ecx register of LoadLibrayA in our hands as depicted below:

(Lines 8-9) At the offset 0x1c, we can find the “Export Address Table” array. We just add the base address of kernel32.dll and we are placed at the beginning of the array. Some of you may ask again the logic behind 0x1c:

(Lines 10-11) Now that we have the correct index for the “Export Address Table” array in ecx, we can find the LoadLibrayA function pointer (RVA of LoadLibraryA) at the AddressOfFunctions[ecx] location:

We use "ecx * 4" because each pointer has 4 bytes and esi points to the beginning of the array.

In the end, we add the base address so we will have in the edi the pointer to the LoadLibraryA function:

Finally, we resolve dynamically at runtime the address of LoadLibraryA is 75A60BD0 and of course, we set ret instruction to return from the actual procedure _get_addr.

(Line 2) We're basically pushing the address of LoadLibrayA on the stack through push edibecause we will use it after in our main procedure.

6.Find the address of GetProcAddress function

Roughly the same steps that we deeply explain in the previous section to find LoadLibrayA address however, I will clarify some instructions:

_start:
	xor eax, eax              ; Avoid Null Bytes
	mov eax, [fs:eax + 0x30]  ; EAX = PEB
	mov eax, [eax + 0xc]	  	; EAX = PEB->Ldr
	mov esi, [eax + 0x14]	  	; ESI = PEB->Ldr.InMemoryOrderModuleList
	lodsd					  					; EAX = Second module (ntdll.dll)
	xchg eax, esi			  			; move to next element
	lodsd					  					; EAX = Third(kernel32)
	mov eax, [eax + 0x10]	  	; EAX = Base address

  mov ebx, [eax + 0x3c]     ; RVA of PE signature
	add ebx, eax			 			  ; VA of PE signature
	mov ebx, [ebx + 0x78]	    ; RVA of the exported directory
	add ebx, eax			        ; VA of the exported directory
	mov esi, [ebx + 0x20]     ; RVA of the exported function names table
	add esi, eax			        ; VA of the exported function names table
	mov edx, eax              ; save eax into edx
	push esi				          ; save the VA of the exported function names table

	push 0x00059ba3						; hash of LoaLibraryA
	xor ecx, ecx			        ; counter for storing	the ordinal number of LoaLibraryA 
	call _find_addr
	call _get_addr
	push edi

	
	mov esi, [esp + 8]        ; restore VA of the exported function names table

	
	push 0x0015bdfd
	xor ecx, ecx			        ; prepare counter	
	call _find_addr
	call _get_addr
	push edi
	
_get_addr: ...
_find_addr: ...
_calculate_hash: ...

First, let's agree that the precomputed hash of GetProcAddress is: 0dfdx0015b

(Line 18) We're saving the value of esi on the stack why? As you know esi register is holding the address of exported function names and since we will start the process of finding the address of LoadLibray this value will be overwritten the fact that lodsd instruction will increment the esi register value with 4. Hence, we push it on the stack then retrieve back after finding the address of LoadLibraryA via the following instruction: mov esi, [esp + 8] then, we can smoothly start the process of finding GetProcAddress address.

(Line 34) We're doing the same approach as before pushing the address of GetProcAddress on the stack throughpush edisince we will use it after in our main procedure.

7. Load user32.dll library & Get MessageBox function address

We previously found the LoadLibraryA function address, we will use it now to load into memory the "user32.dll" library which contains our MessageBox function that will use it as POC to leverage from the technique discussed on this blog:

HMODULE LoadLibraryA(
  LPCSTR lpLibFileName
);

lpLibFileName is the name of the module which will be in our case "user32.dll".

_do_main:
	mov edi, [esp + 8]
	push "ll"
	push "32.d"
	push "user"
	push esp
	call edi
	
	push "oxA"
	push "ageB"
	push "Mess"
	push esp
	push eax
	mov edi, [esp + 32]
	call edi

(Lines 1-7) we set the procedure _do_main which represents our main function. Then, as you notice previously we push on the stack the "LoadLibraryA" address. So we retrieve it through the stack pointer esp. Now, we want to call "LoadLibraryA("user32.dll")". So we need to place the user32.dll string on the stack.

At esp, we have the "user32.dll" string. We push this parameter on the stack to load the library and this will return in eax the user32.dll library base address where the DLL is loaded into memory. We will need it later:

We loaded into memory the user32.dll library, now we want to call GetProcAddress to get the address of the MessageBox function.

FARPROC GetProcAddress(
  HMODULE hModule,
  LPCSTR  lpProcName
);

hModule A handle to the DLL module that contains the function or variable. The LoadLibraryA, function returns this handle.
lpProcNameThe function or variable name, or the function's ordinal value. If this parameter is an ordinal value, it must be in the low-order word; the high-order word must be zero.

(Line 9-15) We want to call "GetProcAddress(user32.dll, "MessageBox")" so again we need to place the MessageBox string on the stack. At esp, we have the "MessageBox " string then we push this parameter on the stack as well as we push also the eax register which contains the user32.dll base address, and calls edi register which holds GetProcAddress function:

After, calling GetProcAddress, it will return in eax the MessageBox base address as depicted above, since we will need it after.

8. Call MessageBox function

Now we have all the ingredients to call MessageBox function, we just need to prepare the right parameters for it:

int MessageBox(
  HWND    hWnd,
  LPCTSTR lpText,
  LPCTSTR lpCaption,
  UINT    uType
);

hWnd A handle to the owner window of the message box to be created. If this parameter is NULL, the message box has no owner window.
lpText The message to be displayed. If the string consists of more than one line, you can separate the lines using a carriage return and/or linefeed character between each line.
lpCaption The dialog box title. If this parameter is NULL, the default title is Error.
uTypeThe contents and behavior of the dialog box. This parameter can be a combination of flags from the following groups of flags.

As an example, we want to call:

int MessageBox(
        NULL,
        (LPCWSTR)L"T3nb3w",
        (LPCWSTR)L"T3nb3w",
         MB_OK 
    );

Remember that the calling convention for x86, arguments are push in reverse order:

Thus, we can do that via the following asm code:

	push "3w"
	push "T3nb"
	mov esi, esp

	xor ebx, ebx
	push ebx     ; MB_OK  = 0x00000000L
	push esi		 ; (LPCWSTR)L"T3nb3w"
	push esi		 ; (LPCWSTR)L"T3nb3w"
	push ebx	   ; NULL
	call eax

(Line 1-3)So, we need to place the "T3nb3w" string on the stack. At esp, we have the "T3nb3w" string then we move this parameter into esi.

(Line 5-10) we cleared ebx register and we push respectively the following registers ebx, esi, esi, and ebx, finally we're calling eax that hold already the base address of MessageBox.

Below shows our assembly in a debugger. The MessageBox pops after call eax instruction is executed:

8. Final Shellcode

Now we just need to add all parts together and the final shellcode is the following:

BITS 32

global _start

section .text

_start:
	xor eax, eax              ; Avoid Null Bytes
	mov eax, [fs:eax + 0x30]  ; EAX = PEB
	mov eax, [eax + 0xc]	    ; EAX = PEB->Ldr
	mov esi, [eax + 0x14]	    ; ESI = PEB->Ldr.InMemoryOrderModuleList
	lodsd					            ; EAX = Second module (ntdll.dll)
	xchg eax, esi			        ; move to next element
	lodsd					            ; EAX = Third(kernel32)
	mov eax, [eax + 0x10]	    ; EAX = Base address

  	mov ebx, [eax + 0x3c]   ; RVA of PE signature
	add ebx, eax			        ; VA of PE signature
	mov ebx, [ebx + 0x78]	    ; RVA of the exported directory
	add ebx, eax			        ; VA of the exported directory
	mov esi, [ebx + 0x20]     ; RVA of the exported function names table
	add esi, eax			        ; VA of the exported function names table
	mov edx, eax              ; save eax into edx
	push esi				          ; save the VA of the exported function names table

	push 0x00059ba3
	xor ecx, ecx			        ; prepare counter		
	call _find_addr
	call _get_addr
	push edi

	
	mov esi, [esp + 8]        ; restore VA of the exported function names table

	
	push 0x0015bdfd
	xor ecx, ecx			        ; prepare counter	
	call _find_addr
	call _get_addr
	push edi
	jmp _do_main

_get_addr:
	mov esi, [ebx + 0x24]         ; RVA of function ordinal table
	add esi, edx			  		      ; VA of function ordinal table
	mov cx, WORD [esi + ecx * 2]	; get LoadLibray biased_ordinal
	dec ecx							          ; get LoadLibray ordinal
	mov esi, [ebx + 0x1c]					; RVA of AddressOfFunctions or the Export Table
	add esi, edx									; VA of the Exported Table
	mov edi, [esi + ecx * 4]			; RVA of LoadLibrayA
	add edi, edx									; VA of LoadLibrayA
	ret
	  

_find_addr:
	inc ecx												;increment name index counter
	lodsd													;load name rva into eax and increment esi by 4 to next rva
	add eax,edx										;add kernel32.dll base address to get va of function name
	call _calculate_hash					;get the hash
	cmp edi, [esp + 4]						;compare our hash
	jnz _find_addr								;loop if not matching
	ret														;return, ecx now holds the name array index of our function
	
_calculate_hash:
	push ecx
	push edx

	xor ecx, ecx
	mov edi, ecx
	mov edx, edi

	_loop:
		shl edi,1
		mov dl, BYTE [eax + ecx]
		add edi, edx
		inc ecx
		cmp BYTE[eax + ecx], 0
		jne _loop

	pop edx
	pop ecx
	ret

_do_main:
	mov edi, [esp + 8]
	push "ll"
	push "32.d"
	push "user"
	push esp
	call edi

	push "oxA"
	push "ageB"
	push "Mess"
	push esp
	push eax
	mov edi, [esp + 32]
	call edi
	
	push "3w"
	push "T3nb"
	mov esi, esp

	xor ebx, ebx
	push ebx
	push esi
	push esi
	push ebx
	call eax

Note this shellcode is used to learn PE parsing Export Table technique through debugging it's not 100% operational.

Our shellcode will only work on processes that have already kernel32.dll loaded. However, if you create a suspended process the only loaded modules will be the exe and ntdll.dll, so the shellcode wouldn't work if you inject it into a brand new suspended process. In this case, we could alter the shellcode to use ntdll!LdrLoadDll(Undocumented Function) instead of kernel32!LoadLibrary.

9. Conclusion

Frankly, it took time for me to understand this wonderful technique of PE Parsing Export Table used by sophisticated malware thus I tried to explain it from scratch. Using the shellcode’s PE parsing ability instead of GetProcAddress has the additional benefit of making reverse-engineering of the shellcode more difficult. Also, using a hash of WIN32 API function name was a good idea to hide them from casual inspection.

I hope you have learned step by step how we can leverage from PE Parsing Export Table to write Windows shellcode then, resolve all of the shellcode's libraries so that it can interact with the system.

Final Note: I am not a shellcode developer expert I'm just a learner, If you think I said anything incorrect anywhere, feel free to reach out to me and correct me, I would highly appreciate that. And finally, thank you very much for taking your time to read this post.

References

Book: Michael Sikorski, Andrew Honig - Practical Malware Analysis_ The Hands-On Guide to Dissecting Malicious Software-No Starch Press (2012)

Portable Executable File Formatkjk

PE Format - Win32 appsdocsmsft

IMAGE_NT_HEADERS32 (winnt.h) - Win32 appsdocsmsft

Win32 API to enumerate dll export functions?Stack Overflow

Control: x86 Instruction Set Reference

Introduction to Windows shellcode development – Part 3Security Café

MessageBox function (winuser.h) - Win32 appsdocsmsft

PreviousPolymorphism and Virtual Function Reversal in C++NextHeap - House Of Force

Last updated 1 year ago