Bridging C++ and x64 Shellcode Development (Windows)

Introduction

This technical deep-dive explores the intersection of traditional C++ Windows programming and low-level x64 shellcode development. Understanding these concepts is crucial for security research, exploit development, and gaining deeper insight into how Windows executables operate at the binary level.

The x64 Calling Convention Context

In x64 Windows (Microsoft calling convention), registers fall into two categories that determine how functions interact with the CPU state:

Volatile (Caller-saved) Registers

Functions can modify these freely without preserving their values:

  • RAX, RCX, RDX, R8, R9, R10, R11 - General purpose registers

  • XMM0-XMM5 - Floating point/SIMD registers

Non-volatile (Callee-saved) Registers

Functions MUST preserve these if they use them:

  • RBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15 - General purpose registers

  • XMM6-XMM15 - Floating point/SIMD registers

This distinction is critical when writing shellcode because you need to know which registers you can use freely and which require preservation to avoid crashing the target process.

Register Usage in Function Calls

The Microsoft x64 calling convention uses a fastcall-style approach:

  • RCX - 1st integer/pointer argument

  • RDX - 2nd integer/pointer argument

  • R8 - 3rd integer/pointer argument

  • R9 - 4th integer/pointer argument

  • Stack - 5th and subsequent arguments (pushed right to left)

Shadow Space Requirement

Understanding Shadow Space Allocation

All non-leaf functions that call other functions. must allocate Shadow Space for the functions they call. The shadow space is a reserved area on the stack that can be used by the callee to save the four register-passed arguments (RCX, RDX, R8, R9). Since each argument is 8 bytes in x64 architecture, this results in a minimum of 32 bytes (0x20) of shadow space.

However, there's a critical detail that's often overlooked: the actual stack allocation is typically 0x28 (40 bytes), not just 0x20 (32 bytes). This is because the stack must maintain 16-byte alignment, and the call instruction itself pushes an 8-byte return address onto the stack. When a function begins execution, RSP is misaligned by 8 bytes due to this return address. To restore 16-byte alignment while also providing the required 32 bytes of shadow space, functions typically allocate 0x28 bytes. This ensures that after the allocation, RSP is 16-byte aligned, and there's sufficient shadow space available. The shadow space must be positioned immediately adjacent to (above) the caller's return address on the stack. Any additional arguments beyond the first four that need to be passed on the stack are pushed after (below) the shadow space allocation.

The Math Behind 0x28

Let's break down why we use 0x28 instead of 0x20:

  1. Before call instruction: RSP is 16-byte aligned (RSP mod 16 = 0)

  2. After call instruction: RSP is misaligned by 8 bytes (RSP mod 16 = 8) because the return address was pushed

  3. Required shadow space: 32 bytes (0x20)

  4. Required alignment: RSP must be 16-byte aligned before calling other functions

  5. Solution: Allocate 0x28 (40 bytes) = 0x20 (shadow space) + 0x8 (alignment correction)

This way:

  • (RSP - 8 - 0x28) mod 16 = 0

  • RSP - 8 - 0x28 = RSP - 0x30, and if RSP was originally aligned, (original_RSP - 0x30) mod 16 = 0

Proving It in WinDbg

Create a simple test program:

  • Set breakpoint at ParentFunction:

The rsp value is misalighed by 8.

Disassemble the function

  • Step through and verify alignment:

Key Takeaways

  1. Shadow space is 0x20 (32 bytes) - four 8-byte slots for RCX, RDX, R8, R9

  2. Typical allocation is 0x28 (40 bytes) - 0x20 shadow + 0x8 alignment correction

  3. Stack alignment requirement: RSP must be 16-byte aligned before call instructions

  4. 5th+ arguments are placed at RSP+0x20 and beyond (after the shadow space)

  5. Return address is at RSP+0x28 (after the allocation)

The 0x28 allocation elegantly solves both the shadow space requirement and the alignment constraint in a single sub rsp,28h instruction.

Understanding PE Headers for Shellcode

When writing shellcode, you typically cannot rely on the Import Address Table (IAT) like normal executables do. Instead, you must manually locate function addresses by parsing the Process Environment Block (PEB) and walking export tables. This requires understanding the PE (Portable Executable) structure.

Why Parse PE Headers in Shellcode?

Shellcode needs to:

  • Parse PEB to find loaded modules (like ntdll.dll, kernel32.dll)

  • Walk the export table to find function addresses dynamically

  • Understand how Windows structures executables in memory

  • Avoid hardcoded addresses that break with ASLR

Key PE Structures for Shellcode

The PE format has a hierarchical structure:

Important Offsets (x64)

PEB Structure (Process Environment Block)

  • GS:[0x60] = PEB address (in x64, FS:[0x30] in x86)

  • PEB+0x18 = PEB_LDR_DATA pointer

  • PEB_LDR_DATA+0x20 = InMemoryOrderModuleList

LDR_DATA_TABLE_ENTRY

  • +0x10 = InMemoryOrderLinks (LIST_ENTRY)

  • +0x30 = DllBase (base address of the module)

  • +0x38 = EntryPoint

  • +0x40 = SizeOfImage

  • +0x48 = FullDllName (UNICODE_STRING)

  • +0x58 = BaseDllName (UNICODE_STRING)

PE Headers

  • DllBase+0x3C = e_lfanew (offset to PE header)

  • PE+0x88 = Export Directory RVA (in OptionalHeader.DataDirectory[0])

Export Directory (IMAGE_EXPORT_DIRECTORY)

  • +0x1C = AddressOfFunctions RVA

  • +0x20 = AddressOfNames RVA

  • +0x24 = AddressOfNameOrdinals RVA

  • +0x14 = NumberOfFunctions

  • +0x18 = NumberOfNames

Process module enumeration using the PEB

Position-independent shellcode locates loaded modules by walking the linked list starting at GS:[0x60] (TEB), following the PEB → Ldr → InLoadOrderModuleList. Each LDR_DATA_TABLE_ENTRY provides the module base address (DllBase) and name, enabling shellcode to locate modules such as kernel32.dll without relying on imports.

PEB InLoadOrderModuleList — Typical Module Order

PE header traversal for manual export resolution.

Starting from a module’s image base, shellcode parses the DOS and NT headers to locate the Export Directory. By resolving function names (often via hashing) and converting RVAs to virtual addresses, shellcode can dynamically locate API functions without relying on the Import Address Table.

Bridging Theory to Practice: C++ Implementation

Let's examine how these concepts translate to practical C++ code that mirrors shellcode techniques.

Accessing the PEB

In x64 Windows, the PEB is always located at offset 0x60 in the GS segment register. The __readgsqword intrinsic reads a quadword (8 bytes) from the GS segment at the specified offset. This is the starting point for all manual module resolution.

Finding Module Base by Name

This function demonstrates the core technique used in shellcode:

  1. Access the PEB through GS:[0x60]

  2. Navigate to the loader data structure (PEB_LDR_DATA)

  3. Walk the InMemoryOrderModuleList (a doubly-linked list)

  4. Use CONTAINING_RECORD macro to get the full LDR_DATA_TABLE_ENTRY from the list link

  5. Compare module names until we find our target

  6. Return the DllBase address

WinDbg - PEB Structure

Quick PEB overview:

Detailed PEB structure

Show Ldr pointer

Dereference and show loader data

circle-info

dt tells WinDbg: Show me the layout and values of a data structure.

It works with symbols, so it knows field names, offsets, and nested structures.

@$peb is a pseudo-register in WinDbg.

It evaluates to: the address of the PEB for the current process

WinDbg - InMemoryOrderModuleList Traversal

Steps:

  1. Get the list head:

  1. Show first module:

  1. Display module name:

circle-info

poi(...)Pointer Of Integer

poi() means: Treat this as a pointer and dereference it.

So: poi(@$peb + 0x18) means:

*(PEB + 0x18) → PEB->Ldr → pointer to _PEB_LDR_DATA

Minimal Export Resolver

This is the smallest reusable logic unit for resolving APIs without imports:

Conceptual steps

  1. Locate PEB

  2. Walk loader list

  3. Find module base

  4. Parse PE headers

  5. Resolve export by name hash or string

Shellcode note:

  • Replace strcmp with inline comparison or hashing.

  • Avoid loops with large stack frames.

Conclusion

Understanding the relationship between high-level C++ code and low-level shellcode techniques provides invaluable insight into Windows internals. The x64 calling convention, PE structure parsing, and PEB traversal are fundamental skills for security researchers and developers working at the system level.

Key takeaways:

  • The x64 calling convention dictates precise register usage and stack alignment

  • PE headers provide a roadmap for finding functions dynamically

  • The PEB is the gateway to all loaded modules in a process

  • C++ can mirror shellcode techniques using intrinsics and structure offsets

  • Understanding these concepts enhances both offensive and defensive security capabilities

Last updated