Bridging C++ and x64 Shellcode Development (Windows)
Introduction
This technical deep-dive explores the intersection of traditional C++ Windows programming and low-level x64 shellcode development. Understanding these concepts is crucial for security research, exploit development, and gaining deeper insight into how Windows executables operate at the binary level.
The x64 Calling Convention Context
In x64 Windows (Microsoft calling convention), registers fall into two categories that determine how functions interact with the CPU state:
Volatile (Caller-saved) Registers
Functions can modify these freely without preserving their values:
RAX, RCX, RDX, R8, R9, R10, R11 - General purpose registers
XMM0-XMM5 - Floating point/SIMD registers
Non-volatile (Callee-saved) Registers
Functions MUST preserve these if they use them:
RBX, RBP, RDI, RSI, RSP, R12, R13, R14, R15 - General purpose registers
XMM6-XMM15 - Floating point/SIMD registers
This distinction is critical when writing shellcode because you need to know which registers you can use freely and which require preservation to avoid crashing the target process.
Register Usage in Function Calls
The Microsoft x64 calling convention uses a fastcall-style approach:
RCX - 1st integer/pointer argument
RDX - 2nd integer/pointer argument
R8 - 3rd integer/pointer argument
R9 - 4th integer/pointer argument
Stack - 5th and subsequent arguments (pushed right to left)
Shadow Space Requirement
Understanding Shadow Space Allocation
All non-leaf functions that call other functions. must allocate Shadow Space for the functions they call. The shadow space is a reserved area on the stack that can be used by the callee to save the four register-passed arguments (RCX, RDX, R8, R9). Since each argument is 8 bytes in x64 architecture, this results in a minimum of 32 bytes (0x20) of shadow space.
However, there's a critical detail that's often overlooked: the actual stack allocation is typically 0x28 (40 bytes), not just 0x20 (32 bytes). This is because the stack must maintain 16-byte alignment, and the call instruction itself pushes an 8-byte return address onto the stack. When a function begins execution, RSP is misaligned by 8 bytes due to this return address. To restore 16-byte alignment while also providing the required 32 bytes of shadow space, functions typically allocate 0x28 bytes. This ensures that after the allocation, RSP is 16-byte aligned, and there's sufficient shadow space available. The shadow space must be positioned immediately adjacent to (above) the caller's return address on the stack. Any additional arguments beyond the first four that need to be passed on the stack are pushed after (below) the shadow space allocation.
The Math Behind 0x28
Let's break down why we use 0x28 instead of 0x20:
Before call instruction: RSP is 16-byte aligned (RSP mod 16 = 0)
After call instruction: RSP is misaligned by 8 bytes (RSP mod 16 = 8) because the return address was pushed
Required shadow space: 32 bytes (0x20)
Required alignment: RSP must be 16-byte aligned before calling other functions
Solution: Allocate 0x28 (40 bytes) = 0x20 (shadow space) + 0x8 (alignment correction)
This way:
(RSP - 8 - 0x28) mod 16 = 0
RSP - 8 - 0x28 = RSP - 0x30, and if RSP was originally aligned, (original_RSP - 0x30) mod 16 = 0
Proving It in WinDbg
Create a simple test program:
Set breakpoint at ParentFunction:

The rsp value is misalighed by 8.
Disassemble the function

Step through and verify alignment:
Key Takeaways
Shadow space is 0x20 (32 bytes) - four 8-byte slots for RCX, RDX, R8, R9
Typical allocation is 0x28 (40 bytes) - 0x20 shadow + 0x8 alignment correction
Stack alignment requirement: RSP must be 16-byte aligned before
callinstructions5th+ arguments are placed at RSP+0x20 and beyond (after the shadow space)
Return address is at RSP+0x28 (after the allocation)
The 0x28 allocation elegantly solves both the shadow space requirement and the alignment constraint in a single sub rsp,28h instruction.
Understanding PE Headers for Shellcode
When writing shellcode, you typically cannot rely on the Import Address Table (IAT) like normal executables do. Instead, you must manually locate function addresses by parsing the Process Environment Block (PEB) and walking export tables. This requires understanding the PE (Portable Executable) structure.
Why Parse PE Headers in Shellcode?
Shellcode needs to:
Parse PEB to find loaded modules (like ntdll.dll, kernel32.dll)
Walk the export table to find function addresses dynamically
Understand how Windows structures executables in memory
Avoid hardcoded addresses that break with ASLR
Key PE Structures for Shellcode
The PE format has a hierarchical structure:
Important Offsets (x64)
PEB Structure (Process Environment Block)
GS:[0x60] = PEB address (in x64, FS:[0x30] in x86)
PEB+0x18 = PEB_LDR_DATA pointer
PEB_LDR_DATA+0x20 = InMemoryOrderModuleList
LDR_DATA_TABLE_ENTRY
+0x10 = InMemoryOrderLinks (LIST_ENTRY)
+0x30 = DllBase (base address of the module)
+0x38 = EntryPoint
+0x40 = SizeOfImage
+0x48 = FullDllName (UNICODE_STRING)
+0x58 = BaseDllName (UNICODE_STRING)
PE Headers
DllBase+0x3C = e_lfanew (offset to PE header)
PE+0x88 = Export Directory RVA (in OptionalHeader.DataDirectory[0])
Export Directory (IMAGE_EXPORT_DIRECTORY)
+0x1C = AddressOfFunctions RVA
+0x20 = AddressOfNames RVA
+0x24 = AddressOfNameOrdinals RVA
+0x14 = NumberOfFunctions
+0x18 = NumberOfNames
Process module enumeration using the PEB
Position-independent shellcode locates loaded modules by walking the linked list starting at GS:[0x60] (TEB), following the PEB → Ldr → InLoadOrderModuleList. Each LDR_DATA_TABLE_ENTRY provides the module base address (DllBase) and name, enabling shellcode to locate modules such as kernel32.dll without relying on imports.
PEB InLoadOrderModuleList — Typical Module Order
PE header traversal for manual export resolution.
Starting from a module’s image base, shellcode parses the DOS and NT headers to locate the Export Directory. By resolving function names (often via hashing) and converting RVAs to virtual addresses, shellcode can dynamically locate API functions without relying on the Import Address Table.
Bridging Theory to Practice: C++ Implementation
Let's examine how these concepts translate to practical C++ code that mirrors shellcode techniques.
Accessing the PEB
In x64 Windows, the PEB is always located at offset 0x60 in the GS segment register. The __readgsqword intrinsic reads a quadword (8 bytes) from the GS segment at the specified offset. This is the starting point for all manual module resolution.

Finding Module Base by Name
This function demonstrates the core technique used in shellcode:
Access the PEB through GS:[0x60]
Navigate to the loader data structure (PEB_LDR_DATA)
Walk the InMemoryOrderModuleList (a doubly-linked list)
Use
CONTAINING_RECORDmacro to get the full LDR_DATA_TABLE_ENTRY from the list linkCompare module names until we find our target
Return the DllBase address
WinDbg - PEB Structure
Quick PEB overview:

Detailed PEB structure

Show Ldr pointer

Dereference and show loader data

dt tells WinDbg: Show me the layout and values of a data structure.
dt tells WinDbg: Show me the layout and values of a data structure.WinDbg - InMemoryOrderModuleList Traversal
Steps:
Get the list head:

Show first module:

Display module name:

Minimal Export Resolver
This is the smallest reusable logic unit for resolving APIs without imports:
Conceptual steps
Locate PEB
Walk loader list
Find module base
Parse PE headers
Resolve export by name hash or string
Shellcode note:
Replace
strcmpwith inline comparison or hashing.Avoid loops with large stack frames.
Conclusion
Understanding the relationship between high-level C++ code and low-level shellcode techniques provides invaluable insight into Windows internals. The x64 calling convention, PE structure parsing, and PEB traversal are fundamental skills for security researchers and developers working at the system level.
Key takeaways:
The x64 calling convention dictates precise register usage and stack alignment
PE headers provide a roadmap for finding functions dynamically
The PEB is the gateway to all loaded modules in a process
C++ can mirror shellcode techniques using intrinsics and structure offsets
Understanding these concepts enhances both offensive and defensive security capabilities
Last updated