Bridging C++ and x64 Shellcode Development (Windows)

Introduction

This technical deep-dive explores the intersection of traditional C++ Windows programming and low-level x64 shellcode development. Understanding these concepts is crucial for security research, exploit development, and gaining deeper insight into how Windows executables operate at the binary level.

The x64 Calling Convention Context

In x64 Windows (Microsoft calling convention), registers fall into two categories that determine how functions interact with the CPU state:

Volatile (Caller-saved) Registers

Functions can modify these freely without preserving their values:

RAX, RCX, RDX, R8, R9, R10, R11 - General purpose registers
XMM0-XMM5 - Floating point/SIMD registers

Non-volatile (Callee-saved) Registers

Functions MUST preserve these if they use them:

RBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15 - General purpose registers
XMM6-XMM15 - Floating point/SIMD registers

This distinction is critical when writing shellcode because you need to know which registers you can use freely and which require preservation to avoid crashing the target process.

Register Usage in Function Calls

The Microsoft x64 calling convention uses a fastcall-style approach:

RCX - 1st integer/pointer argument
RDX - 2nd integer/pointer argument
R8 - 3rd integer/pointer argument
R9 - 4th integer/pointer argument
Stack - 5th and subsequent arguments (pushed right to left)

Shadow Space Requirement

Understanding Shadow Space Allocation

All non-leaf functions that call other functions. must allocate Shadow Space for the functions they call. The shadow space is a reserved area on the stack that can be used by the callee to save the four register-passed arguments (RCX, RDX, R8, R9). Since each argument is 8 bytes in x64 architecture, this results in a minimum of 32 bytes (0x20) of shadow space.

However, there's a critical detail that's often overlooked: the actual stack allocation is typically 0x28 (40 bytes), not just 0x20 (32 bytes). This is because the stack must maintain 16-byte alignment, and the call instruction itself pushes an 8-byte return address onto the stack. When a function begins execution, RSP is misaligned by 8 bytes due to this return address. To restore 16-byte alignment while also providing the required 32 bytes of shadow space, functions typically allocate 0x28 bytes. This ensures that after the allocation, RSP is 16-byte aligned, and there's sufficient shadow space available. The shadow space must be positioned immediately adjacent to (above) the caller's return address on the stack. Any additional arguments beyond the first four that need to be passed on the stack are pushed after (below) the shadow space allocation.

The Math Behind 0x28

Let's break down why we use 0x28 instead of 0x20:

Before call instruction: RSP is 16-byte aligned (RSP mod 16 = 0)
After call instruction: RSP is misaligned by 8 bytes (RSP mod 16 = 8) because the return address was pushed
Required shadow space: 32 bytes (0x20)
Required alignment: RSP must be 16-byte aligned before calling other functions
Solution: Allocate 0x28 (40 bytes) = 0x20 (shadow space) + 0x8 (alignment correction)

This way:

(RSP - 8 - 0x28) mod 16 = 0
RSP - 8 - 0x28 = RSP - 0x30, and if RSP was originally aligned, (original_RSP - 0x30) mod 16 = 0

Proving It in WinDbg

Create a simple test program:

// compile with: cl /Zi /Od test.cpp
#include <windows.h>

// Minimal non-leaf function
void __declspec(noinline) ChildFunction(int a, int b, int c, int d) {
    volatile int result = a + b + c + d;
}

void __declspec(noinline) ParentFunction() {
    ChildFunction(1, 2, 3, 4);
}

int main() {
    ParentFunction();
    return 0;
}

Set breakpoint at ParentFunction:

The rsp value is misalighed by 8.

Disassemble the function

Step through and verify alignment:

0:000> p
Breakpoint 0 hit
ShadowSpace!ParentFunction:
00007ff7`0f7271d0 4883ec28        sub     rsp,28h         # Execute 'sub rsp,28h'
0:000> p
ShadowSpace!ParentFunction+0x4:
00007ff7`0f7271d4 41b904000000    mov     r9d,4
0:000> r rsp
rsp=000000b0d8affb10
0:000> ? @rsp & 0xf
Evaluate expression: 0 = 00000000`00000000  ← NOW ALIGNED!
0:000> dq @rsp L8 # After the `sub rsp,28h` instruction, examine the stack:
000000b0`d8affb10  00007ff7`0f7a0110 00000000`00000000 # ← Shadow space (RCX home)
000000b0`d8affb20  06100800`000906ea bfebfbff`7ffafbbf # ← Shadow space (RDX, R8 home)
000000b0`d8affb30  00007ff7`0f7a0690 00007ff7`0f727209 # ← Shadow space (R9 home), Return address
000000b0`d8affb40  00000000`0000001f 00000000`00000000 # ← Previous frame

Key Takeaways

Shadow space is 0x20 (32 bytes) - four 8-byte slots for RCX, RDX, R8, R9
Typical allocation is 0x28 (40 bytes) - 0x20 shadow + 0x8 alignment correction
Stack alignment requirement: RSP must be 16-byte aligned before call instructions
5th+ arguments are placed at RSP+0x20 and beyond (after the shadow space)
Return address is at RSP+0x28 (after the allocation)

The 0x28 allocation elegantly solves both the shadow space requirement and the alignment constraint in a single sub rsp,28h instruction.

Understanding PE Headers for Shellcode

When writing shellcode, you typically cannot rely on the Import Address Table (IAT) like normal executables do. Instead, you must manually locate function addresses by parsing the Process Environment Block (PEB) and walking export tables. This requires understanding the PE (Portable Executable) structure.

Why Parse PE Headers in Shellcode?

Shellcode needs to:

Parse PEB to find loaded modules (like ntdll.dll, kernel32.dll)
Walk the export table to find function addresses dynamically
Understand how Windows structures executables in memory
Avoid hardcoded addresses that break with ASLR

Key PE Structures for Shellcode

The PE format has a hierarchical structure:

PE Header Layout:

DOS Header (IMAGE_DOS_HEADER)
 ├─ e_magic (0x5A4D - "MZ")
 └─ e_lfanew (offset to PE header)

PE Header (IMAGE_NT_HEADERS64)
 ├─ Signature (0x4550 - "PE")
 ├─ FileHeader (IMAGE_FILE_HEADER)
 └─ OptionalHeader (IMAGE_OPTIONAL_HEADER64)
     ├─ AddressOfEntryPoint
     ├─ ImageBase
     └─ DataDirectory[]
         └─ Export Table (index 0)

Export Directory (IMAGE_EXPORT_DIRECTORY)
 ├─ AddressOfFunctions
 ├─ AddressOfNames
 └─ AddressOfNameOrdinals

Important Offsets (x64)

PEB Structure (Process Environment Block)

GS:[0x60] = PEB address (in x64, FS:[0x30] in x86)
PEB+0x18 = PEB_LDR_DATA pointer
PEB_LDR_DATA+0x20 = InMemoryOrderModuleList

LDR_DATA_TABLE_ENTRY

+0x10 = InMemoryOrderLinks (LIST_ENTRY)
+0x30 = DllBase (base address of the module)
+0x38 = EntryPoint
+0x40 = SizeOfImage
+0x48 = FullDllName (UNICODE_STRING)
+0x58 = BaseDllName (UNICODE_STRING)

PE Headers

DllBase+0x3C = e_lfanew (offset to PE header)
PE+0x88 = Export Directory RVA (in OptionalHeader.DataDirectory[0])

Export Directory (IMAGE_EXPORT_DIRECTORY)

+0x1C = AddressOfFunctions RVA
+0x20 = AddressOfNames RVA
+0x24 = AddressOfNameOrdinals RVA
+0x14 = NumberOfFunctions
+0x18 = NumberOfNames

Process module enumeration using the PEB

Position-independent shellcode locates loaded modules by walking the linked list starting at GS:[0x60] (TEB), following the PEB → Ldr → InLoadOrderModuleList. Each LDR_DATA_TABLE_ENTRY provides the module base address (DllBase) and name, enabling shellcode to locate modules such as kernel32.dll without relying on imports.

PEB InLoadOrderModuleList — Typical Module Order

PEB
 |
 v
PEB->Ldr
 |
 v
PEB_LDR_DATA
 |
 v
InLoadOrderModuleList
 |
 v
+------------------------------------------------+
| LDR_DATA_TABLE_ENTRY                           |
|------------------------------------------------|
| BaseDllName : your_program.exe                 |
| DllBase     : ImageBase of EXE                 |
| InLoadOrderLinks (FLINK / BLINK)                |
+----------------------+-------------------------+
                       |
                       v
+------------------------------------------------+
| LDR_DATA_TABLE_ENTRY                           |
|------------------------------------------------|
| BaseDllName : ntdll.dll                        |
| DllBase     : ImageBase of ntdll               |
| InLoadOrderLinks                               |
+----------------------+-------------------------+
                       |
                       v
+------------------------------------------------+
| LDR_DATA_TABLE_ENTRY                           |
|------------------------------------------------|
| BaseDllName : kernel32.dll                     |
| DllBase     : ImageBase of kernel32             |
| InLoadOrderLinks                               |
+----------------------+-------------------------+
                       |
                       v
+------------------------------------------------+
| LDR_DATA_TABLE_ENTRY                           |
|------------------------------------------------|
| BaseDllName : KernelBase.dll                   |
| DllBase     : ImageBase of KernelBase           |
+------------------------------------------------+

PE header traversal for manual export resolution.

Starting from a module’s image base, shellcode parses the DOS and NT headers to locate the Export Directory. By resolving function names (often via hashing) and converting RVAs to virtual addresses, shellcode can dynamically locate API functions without relying on the Import Address Table.

        ImageBase (DllBase)
              |
              v
     +----------------------+
     | IMAGE_DOS_HEADER     |
     |----------------------|
     | e_magic   = "MZ"     |
     | e_lfanew  ----------+------------------+
     +----------------------+                  |
                                               v
                                   +----------------------------+
                                   | IMAGE_NT_HEADERS64         |
                                   |----------------------------|
                                   | Signature = "PE"           |
                                   | OptionalHeader             |
                                   +-------------+--------------+
                                                 |
                                                 v
                                   +----------------------------+
                                   | IMAGE_OPTIONAL_HEADER      |
                                   |----------------------------|
                                   | DataDirectory[EXPORT] ----+------+
                                   +----------------------------+      |
                                                                         v
                               +----------------------------------------------+
                               | IMAGE_EXPORT_DIRECTORY                        |
                               |----------------------------------------------|
                               | AddressOfFunctions (RVA array)               |
                               | AddressOfNames     (RVA array)               |
                               | AddressOfNameOrdinals                         |
                               +------------------+---------------------------+
                                                  |
                                                  v
                                  +--------------------------------+
                                  | Resolved Function Address      |
                                  | (VA = ImageBase + RVA)         |
                                  +--------------------------------+

Bridging Theory to Practice: C++ Implementation

Let's examine how these concepts translate to practical C++ code that mirrors shellcode techniques.

Accessing the PEB

PPEB GetPeb()
{
#if defined(_M_X64)
    return reinterpret_cast<PPEB>(__readgsqword(0x60));
#else
#error Unsupported architecture
#endif
}

In x64 Windows, the PEB is always located at offset 0x60 in the GS segment register. The __readgsqword intrinsic reads a quadword (8 bytes) from the GS segment at the specified offset. This is the starting point for all manual module resolution.

Finding Module Base by Name

PVOID GetModuleBaseByName(const wchar_t* moduleName)
{
    if (!moduleName)
        return nullptr;
    
    PPEB peb = GetPeb();
    if (!peb || !peb->Ldr)
        return nullptr;
    
    auto& list = peb->Ldr->InMemoryOrderModuleList;
    
    for (auto entry = list.Flink; entry != &list; entry = entry->Flink)
    {
        auto ldr = CONTAINING_RECORD(
            entry,
            LDR_DATA_TABLE_ENTRY_T,
            InMemoryOrderLinks
        );
        
        if (EqualsInsensitive(ldr->BaseDllName, moduleName))
        {
            return ldr->DllBase;
        }
    }
    
    return nullptr;
}

This function demonstrates the core technique used in shellcode:

Access the PEB through GS:[0x60]
Navigate to the loader data structure (PEB_LDR_DATA)
Walk the InMemoryOrderModuleList (a doubly-linked list)
Use CONTAINING_RECORD macro to get the full LDR_DATA_TABLE_ENTRY from the list link
Compare module names until we find our target
Return the DllBase address

WinDbg - PEB Structure

Quick PEB overview:

Detailed PEB structure

Show Ldr pointer

Dereference and show loader data

`dt` tells WinDbg: Show me the layout and values of a data structure.

It works with symbols, so it knows field names, offsets, and nested structures.

`@$peb` is a pseudo-register in WinDbg.

It evaluates to: the address of the PEB for the current process

WinDbg - InMemoryOrderModuleList Traversal

Steps:

Get the list head:

Show first module:

Display module name:

`poi(...)` — Pointer Of Integer

poi() means: Treat this as a pointer and dereference it.

So: poi(@$peb + 0x18) means:

*(PEB + 0x18) → PEB->Ldr → pointer to _PEB_LDR_DATA

Minimal Export Resolver

This is the smallest reusable logic unit for resolving APIs without imports:

void* ResolveExport(void* moduleBase, const char* name)
{
auto dos = (IMAGE_DOS_HEADER*)moduleBase;
auto nt = (IMAGE_NT_HEADERS64*)((BYTE*)moduleBase + dos->e_lfanew);


auto& dir = nt->OptionalHeader.DataDirectory[0];
if (!dir.VirtualAddress)
return nullptr;


auto exp = (IMAGE_EXPORT_DIRECTORY*)((BYTE*)moduleBase + dir.VirtualAddress);
auto names = (DWORD*)((BYTE*)moduleBase + exp->AddressOfNames);
auto ords = (WORD*)((BYTE*)moduleBase + exp->AddressOfNameOrdinals);
auto funcs = (DWORD*)((BYTE*)moduleBase + exp->AddressOfFunctions);


for (DWORD i = 0; i < exp->NumberOfNames; i++)
{
const char* fn = (const char*)moduleBase + names[i];
if (!strcmp(fn, name))
return (BYTE*)moduleBase + funcs[ords[i]];
}
return nullptr;
}

Conceptual steps

Locate PEB
Walk loader list
Find module base
Parse PE headers
Resolve export by name hash or string

Shellcode note:

Replace strcmp with inline comparison or hashing.
Avoid loops with large stack frames.

Conclusion

Understanding the relationship between high-level C++ code and low-level shellcode techniques provides invaluable insight into Windows internals. The x64 calling convention, PE structure parsing, and PEB traversal are fundamental skills for security researchers and developers working at the system level.

Key takeaways:

The x64 calling convention dictates precise register usage and stack alignment
PE headers provide a roadmap for finding functions dynamically
The PEB is the gateway to all loaded modules in a process
C++ can mirror shellcode techniques using intrinsics and structure offsets
Understanding these concepts enhances both offensive and defensive security capabilities

PreviousRAII NextLeveraging from PE parsing technique to write x86 shellcode

Last updated 1 month ago

hashtagIntroduction

hashtagThe x64 Calling Convention Context

hashtagVolatile (Caller-saved) Registers

hashtagNon-volatile (Callee-saved) Registers

hashtagRegister Usage in Function Calls

hashtagShadow Space Requirement

hashtagUnderstanding Shadow Space Allocation

hashtagThe Math Behind 0x28

hashtagProving It in WinDbg

hashtagDisassemble the function

hashtagKey Takeaways

hashtagUnderstanding PE Headers for Shellcode

hashtagWhy Parse PE Headers in Shellcode?

hashtagKey PE Structures for Shellcode

hashtagImportant Offsets (x64)

hashtagProcess module enumeration using the PEB

hashtagPEB InLoadOrderModuleList — Typical Module Order

hashtagPE header traversal for manual export resolution.

hashtagBridging Theory to Practice: C++ Implementation

hashtagAccessing the PEB

hashtagFinding Module Base by Name

hashtagWinDbg - PEB Structure

hashtagdt tells WinDbg: Show me the layout and values of a data structure.

hashtag@$peb is a pseudo-register in WinDbg.

hashtagWinDbg - InMemoryOrderModuleList Traversal

hashtagpoi(...) — Pointer Of Integer

hashtagMinimal Export Resolver

hashtagConceptual steps

hashtagConclusion