Leveraging from PE parsing technique to write x86 shellcode
Last updated
Last updated
Shellcode is often used alongside an exploit to subvert a running program, or by an injector performing a process injection. Hence, shellcode must dynamically locate the required WIN32 API functions to work reliably and efficiently in different Windows versions, and for that task, it typically uses LoadLibraryA and GetProcAddress that are exported from "kernel32.dll".
In this post, we will explore the world of Win32 shellcode development using what we learned from the previous blog regarding PEB structure, and specifically, we will understand how the shellcode leverage from PE parsing technique.
Thus, everything will be done directly within the debugger via IDA Pro, as well as we will easily get the opcodes and test our shellcode step by step. Finally, to truly understand the structure of Kernel32.dll, we will use CFF Explorer and view the contents of this precious DLL. Now, let's fasten our seat belts and start!
In our previous post "Digging into Windows PEB", we conclude that any executable file is being loaded in the memory, the Windows loads beside it the main core libraries kernel32.dll & ntdll.dll and saves the addresses of these libraries in the base address. The figure below describes the data structures that are followed to find the base address of kernel32.dll:
So, we will retrieve the base address of kernel32.dll from the PEB as shown in the following sample assembly code:
To understand this sample of assembly code just take a look at my previous post.
With this assembly code, we can find the kernel32.dll base address and store it in eax register, thus we need to assemble it via nasm. As the program is written in x86 assembly, the elf32
file type is specified using the-f
flag then disassembled into opcodes using objdump :
Now, let's test our shellcode within the context of a C program, the shellcode can be placed in a test program (titled runner.c
in this example) written in C, as shown below:
This program should be compiled and executed in IDA PRO for debugging purposes:
Now, eax register points to a memory address 0x75720000, which indicates that we got the base address of the kernel32.dll successfully. We can substantiate this result with the fact that we're pointing into e_magic which is a member of MS-DOS header of kernel32.dll :
The first field, e_magic, is called also the magic number. This field is used to identify an MS-DOS-compatible file type. All MS-DOS-compatible executable files set this value to 0x5A4D, which represents the ASCII characters MZ. At this level, we retrieve the address of memory where kernel32.dll is loaded!
Before diving into this part, I would like to highlight some mandatory definitions :
Relative Virtual Address(RVA): In an image file, this is the address of an item after it is loaded into memory, with the base address of the image file subtracted from it.The RVA of an item almost always differs from its position within the file on disk (file pointer). --> RVA = VA - BaseAddress
Virtual Address (VA):Same as RVA, except that the base address of the image file is not subtracted. The address is called a VA because Windows creates a distinct VA space for each process, independent of physical memory. For almost all purposes, a VA should be considered just an address. A VA is not as predictable as an RVA because the loader might not load the image at its preferred location. --> VA = RVA + BaseAddress
We found the base address of kernel32.dll in memory. Now we need to parse this PE file and find the export directory:
e_lfanew is a 4-byte offset into the file where the PE file header is located. It is necessary to use this offset to locate the PE header in the file.
(Lines 1-2) We know that we can find the “e_lfanew” pointer at the offset 0x3C:
After this operation mov ebx, [eax + 0x3c]
, the ebx should hold the value F8, as depicted in the following figure:
Now, we can find the address of PE signature by adding kernel32 base address and the PE signature RVA: 0x75720000 + F8 = 0x757200F8 and we find the PE signature there:
As you know the PE header is a structure that contains the following information:
Signature member identifying the file as a PE image. The bytes are 0x4550(we could notice the value presented is 50 45 the reason is little-endian) which represents the ASCII characters "PE" as you can see above in our debugging process.
(Lines 3-4) The IMAGE_OPTIONAL_HEADER is a structure containing more useful information for us:
It contains our main member which is DataDirectory that contains information such as imported and exported functions.
At the offset 0x78 of the PE header, we can find the RVA of Export Directory:
most of you will ask how we get this offset very simple:
or sizeof(PE_Signature) + sizeof(IMAGE_FILE_HEADER) + offsetof(IMAGE_OPTIONAL_HEADER,DataDirectory) = 120 bytes (78 in hex)
Again, we add this value to the eax register and we are now placed on the export directory of the kernel32.dll.
The export directory is the following structure:
The relevant fields in the _IMAGE_EXPORT_DIRECTORY:
AddressOfFunctions is an array of RVAs that points to the actual export functions. It is indexed by an export ordinal. The shellcode needs to map the export name to the ordinal
to use this array.
This mapping is done via AddressOfNames and AddressOfNameOrdinals arrays. These two arrays exist in parallel. They have the same number of entries, and equivalent indices into these arrays are directly related.
AddressOfNames is an array of 32-bit RVAs that point to the strings of symbol names.
AddressOfNameOrdinals is an array of 16-bit ordinals. For a given index id into these arrays, the symbol at AddressOfNames[id] has the export ordinal value at AddressOfNameOrdinals[id].
(Lines 5-6) In the IMAGE_EXPORT_DIRECTORY structure, at the offset 0x20, contains an RVA of the exported function names table which is 0x000945B4:
Again :p most of you will ask how we get this offset very simple:
Let's retrieve the address of exported function names table by adding the Name Pointer Table RVA 0x000945B4 with kernel32 base address 0x75720000, which results in 0x757B45B4 that store the name of an RVA of the first exported function 0x00096BCA:
It’s not always a good idea to use ASCII strings, an UNICODE string since it will just make our shellcode bigger! and also easy to spot. So it would be better to use a hash value to look up our targeted WIN32 API functions.
For that reason, we used the C program from StackOverflow that resolves all exported WIN32 API functions that exist in kernel32.dll (we're doing the same via assembly version), and in every callback, we will generate a unique hash for the corresponding exported function via the following code snippet:
If you notice above the calculate_hash function basically it's a "loop for" that simply shifts left by 1 the value existing in hash variable then add it to szName[i] which hold an exported function name.
Generate all hashes of exported functions that actually exist in kernel32.dll:
as result:
let's first inspect our first exported function in memory using the following assembly code :
(Line 1) we push the precomputed hash value of LoadLibraryA on the stack since we will use it after to find our targeted function.
(Line 2) we set ecx register to 0 for mapping the export name of the targeted WIN32 API function(LoadLibraryA) with his ordinal to retrieve his address in AddressOfFunctions array.
(Line 3) we save eax register that actually holds the base address of kernel32.dll into edx, because after we will use lodsd that will overwrite our eax register.
Loads a byte, word, or doubleword from the source operand into the AL, AX, or EAX register, respectively.
(Line 6-9) we create a procedure called "_find_addr" which is presented via a label in our asm code and we will use lodsd that will take esi register the pointer to the first function name. The lodsd instruction will place in eax the offset to the function name ( “AcquireSRWLockExclusive”) and we add this with the edx (kernel32 base address) to find the correct pointer. Note that the lodsd instruction will also increment the esi register value with 4! This helps us because we do not have to increment it manually, we just need to call again lodsd to get the next function name pointer:
Remember we're incrementing ecx register, which will be the counter of our functions and the function ordinal number.
Next step, we need to calculate the hash of every exported function name as we did in the C language version:
(Line 7-13) we saved all values set on ecx and edx on the stack since we will need them after, then we cleared respectively ecx, edi, and edx. For clarity, we will not clear eax since now it points into the first BYTE of the first exported function:
(Line 15-21) Mainly the _loop function it's an asm representation of C language hash function mentioned previously, the instructionshl edi,1
shift by left the value stored in edi. We stored the first BYTE of the exported function name in dl(8 bits version of edx) and of course, ecx is used in this case to ensure that we're keeping tracking every BYTE, this is done viamov dl, BYTE [eax + ecx]
then we should add edx to edi. However, we need to confirm that we reached the end of the exported function name and this is done via the following instructioncmp [edi + ecx], 0
since every exported function name it's a null-terminated string and finally, we keep looping till the ZF is set to 1:
Now, edi hold the hash of the first exported function name 2A992F1D, we can confirm that by grepping into our hash.txt generated previously:
(Line 23-25)Finally, we need to restore all registers values that we pushed on that stack and get back to our function fin_addr via ret instruction.
Now, we need to compare if the value stored in edi match the hash of LoadLibraryA which already pushed on the stack:
(Line 1- 5 ) Already explained previously.
(Line 6-8) You should put in mind that whenever the function _calculate_hash is called, it will return a hash value of an exported function name in edi register, and to make sure that that we find LoadLibraryA hash function we need to set this instruction : cmp edi, [esp + 4 ]
and keep looping till ZF is set to 1. For debugging purposes, we will set a BP at ret instruction:
At that point we're basically reaching our goal which is finding the hash value of LoadLibraryA:
eax point at the beginning of LoadLibraryA.
edi holds the hash value of LoadLibrary 0x00059ba3.
The most precious value for us to retrieve the address of LoadLibraryA is:
ecx = 0x000003C6 which is the function ordinal number.
At this point, we only found the ordinal number of the LoadLibrayA function, but we can use it to find the actual address of this function:
(Line 4-5) At this point, we have in ebx a pointer to the IMAGE_EXPORT_DIRECTORY structure. At the offset 0x24 of the structure, we can find the “AddressOfNameOrdinals” offset. In line 5, we add this offset to edx register which is the base address of the kernel32.dll so we get a valid pointer to the name ordinals table. Some of you may ask the logic behind 0x24:
(Lines 6-7) The esi register contains the pointer to the name ordinals array.
The name ordinals array (export ordinal table) is an array of 16-bit unbiased indexes into the export address table. Ordinals are biased by the Ordinal Base field of the export directory table. In other words, the ordinal base must be subtracted from the ordinals to obtain true indexes into the export address table.
This array contains two-byte numbers. Up to now, we have the biased_ordinal of LoadLibraryA function in the ecx register, so this way we get the function address ordinal (index). This will help us to get the function address.
May one of you get confused regarding this instruction mov cx, [esi + ecx * 2].
In fact, we want the value of the ecx=59ba3 element of the name ordinals array of type T: you do [arraystart + (ecx*sizeof(T))] --> [esi + ecx * 2] and the ordinal array it stores ordinals in 2 bytes=T, and finally we stored this value in 2 bytes version of ecx which is cx.
We have to subtract biased_ordinal from OrdinalBase to get the ordinal number of our function:
Since in our case OrdinalBase equal to 1:
Until now, we have the ordinal number stored in ecx register of LoadLibrayA in our hands as depicted below:
(Lines 8-9) At the offset 0x1c, we can find the “Export Address Table” array. We just add the base address of kernel32.dll and we are placed at the beginning of the array. Some of you may ask again the logic behind 0x1c:
(Lines 10-11) Now that we have the correct index for the “Export Address Table” array in ecx, we can find the LoadLibrayA function pointer (RVA of LoadLibraryA) at the AddressOfFunctions[ecx] location:
We use "ecx * 4" because each pointer has 4 bytes and esi points to the beginning of the array.
In the end, we add the base address so we will have in the edi the pointer to the LoadLibraryA function:
Finally, we resolve dynamically at runtime the address of LoadLibraryA is 75A60BD0 and of course, we set ret instruction to return from the actual procedure _get_addr.
(Line 2) We're basically pushing the address of LoadLibrayA on the stack through push edi
because we will use it after in our main procedure.
Roughly the same steps that we deeply explain in the previous section to find LoadLibrayA address however, I will clarify some instructions:
First, let's agree that the precomputed hash of GetProcAddress is: 0dfdx0015b
(Line 18) We're saving the value of esi on the stack why? As you know esi register is holding the address of exported function names and since we will start the process of finding the address of LoadLibray this value will be overwritten the fact that lodsd instruction will increment the esi register value with 4. Hence, we push it on the stack then retrieve back after finding the address of LoadLibraryA via the following instruction: mov esi, [esp + 8]
then, we can smoothly start the process of finding GetProcAddress address.
(Line 34) We're doing the same approach as before pushing the address of GetProcAddress on the stack throughpush edi
since we will use it after in our main procedure.
We previously found the LoadLibraryA function address, we will use it now to load into memory the "user32.dll" library which contains our MessageBox function that will use it as POC to leverage from the technique discussed on this blog:
lpLibFileName
is the name of the module which will be in our case "user32.dll".
(Lines 1-7) we set the procedure _do_main which represents our main function. Then, as you notice previously we push on the stack the "LoadLibraryA" address. So we retrieve it through the stack pointer esp. Now, we want to call "LoadLibraryA("user32.dll")". So we need to place the user32.dll string on the stack.
At esp, we have the "user32.dll" string. We push this parameter on the stack to load the library and this will return in eax the user32.dll library base address where the DLL is loaded into memory. We will need it later:
We loaded into memory the user32.dll library, now we want to call GetProcAddress to get the address of the MessageBox function.
hModule
A handle to the DLL module that contains the function or variable. The LoadLibraryA, function returns this handle.
lpProcName
The function or variable name, or the function's ordinal value. If this parameter is an ordinal value, it must be in the low-order word; the high-order word must be zero.
(Line 9-15) We want to call "GetProcAddress(user32.dll, "MessageBox")" so again we need to place the MessageBox string on the stack. At esp, we have the "MessageBox " string then we push this parameter on the stack as well as we push also the eax register which contains the user32.dll base address, and calls edi register which holds GetProcAddress function:
After, calling GetProcAddress, it will return in eax the MessageBox base address as depicted above, since we will need it after.
Now we have all the ingredients to call MessageBox function, we just need to prepare the right parameters for it:
hWnd
A handle to the owner window of the message box to be created. If this parameter is NULL, the message box has no owner window.
lpText
The message to be displayed. If the string consists of more than one line, you can separate the lines using a carriage return and/or linefeed character between each line.
lpCaption
The dialog box title. If this parameter is NULL, the default title is Error.
uType
The contents and behavior of the dialog box. This parameter can be a combination of flags from the following groups of flags.
As an example, we want to call:
Remember that the calling convention for x86, arguments are push in reverse order:
Thus, we can do that via the following asm code:
(Line 1-3)So, we need to place the "T3nb3w" string on the stack. At esp, we have the "T3nb3w" string then we move this parameter into esi.
(Line 5-10) we cleared ebx register and we push respectively the following registers ebx, esi, esi, and ebx, finally we're calling eax that hold already the base address of MessageBox.
Below shows our assembly in a debugger. The MessageBox pops after call eax
instruction is executed:
Now we just need to add all parts together and the final shellcode is the following:
Note this shellcode is used to learn PE parsing Export Table technique through debugging it's not 100% operational.
Our shellcode will only work on processes that have already kernel32.dll loaded. However, if you create a suspended process the only loaded modules will be the exe and ntdll.dll, so the shellcode wouldn't work if you inject it into a brand new suspended process. In this case, we could alter the shellcode to use ntdll!LdrLoadDll(Undocumented Function) instead of kernel32!LoadLibrary.
Frankly, it took time for me to understand this wonderful technique of PE Parsing Export Table used by sophisticated malware thus I tried to explain it from scratch. Using the shellcode’s PE parsing ability instead of GetProcAddress has the additional benefit of making reverse-engineering of the shellcode more difficult. Also, using a hash of WIN32 API function name was a good idea to hide them from casual inspection.
I hope you have learned step by step how we can leverage from PE Parsing Export Table to write Windows shellcode then, resolve all of the shellcode's libraries so that it can interact with the system.
Final Note: I am not a shellcode developer expert I'm just a learner, If you think I said anything incorrect anywhere, feel free to reach out to me and correct me, I would highly appreciate that. And finally, thank you very much for taking your time to read this post.