Polymorphism and Virtual Function Reversal in C++
Introduction
While exploring the internals of COM objects and their underlying implementation, particularly focusing on the feature of "separating implementation from the definition of behavior," I realized that COM objects heavily rely on C++ runtime polymorphism. To gain a deeper understanding of this key C++ characteristic, I spent some time delving into it, along with conducting some reverse engineering to fully grasp the concept.
Polymorphic code refers to code that can be written once and reused with different types. In C++, polymorphism is achieved through two main approaches: compile-time polymorphism, where types are determined at compile time, and runtime polymorphism, which enables dynamic behavior during program execution. Essentially, polymorphism allows objects to behave as "another type" and exhibit multiple forms. For reverse engineers, this concept is closely tied to dynamic dispatch, where virtual methods play a crucial role. Virtual methods enable calling the appropriate function of the most derived class, even if it's overridden from a base class, providing a mechanism for dynamic dispatch that allows flexible and extensible code behavior.
Motivating Example
Suppose, you need to implement various types of logger, such as remote server logger, a local file logger, or even a logger that sends data to a printer. Additionally, the program should allow dynamic switching between these loggers at runtime. One easy approach is to use a scoped enumeration to manage and switch between these different loggers :
The solution provided above still has some limitations and isn't as flexible as it could be. The primary issue here is that the Bank
class directly depends on specific logger implementations (ConsoleLogger
and FileLogger
). This design violates the Open/Closed Principle, as adding a new logger would require modifying the Bank class, making it less scalable and maintainable.
A better approach would be to use runtime polymorphism and dependency injection by introducing a common interface (abstract class) for loggers. This way, we can pass any logger implementation to the Bank
class without modifying it.
Runtime Polymorphism
Runtime polymorphism enables you to conveniently "program in general" rather than "program in specific", and is also known as dynamic polymorphism or late binding, In runtime polymorphism, the function call is resolved at run time.
To refactor the code above by utilizing polymorphism and dependency injection, we should first clarify these two concepts:
Polymorphism: This allows us to use different types of loggers (e.g.,
ConsoleLogger
,FileLogger
) through a common interface. This means ourBank
class can operate with any logger that implements a common interface, without knowing the specifics of each logger.Dependency Injection: Instead of creating dependencies directly inside the
Bank
class (likeConsoleLogger conslogger
;
andFileLogger filelogger
;
), dependency injection allows us to "inject" these dependencies (in this case, the logger) from outside. This makes ourBank
class more flexible, allowing us to change the logger without altering the class itself. We achieve this by passing the logger instance toBank
, either in the constructor or via a setter method.
Let's apply these concepts to the code.
Step 1: Create a ILogger
Interface
ILogger
InterfaceWe'll define an abstract base class called ILogger
with a pure virtual logTransfer
method. Both the concrete classes ConsoleLogger
and FileLogger
will inherit from this interface and provide their own implementations.
Step 2: Use Dependency Injection in the Bank
Class
Bank
ClassInstead of hard-coding the ConsoleLogger
or FileLogger
objects in Bank
, we’ll inject a Logger
pointer (or reference) into Bank
. This allows the Bank
to work with any Logger
-derived object.
Dependency Injection
What is Dependency Injection? Dependency Injection is a design pattern that helps you pass dependencies (objects a class needs to function) to a class instead of creating them inside the class. This makes your code more flexible and testable.
The Bank
class has a private logger
attribute of type std::shared_ptr<ILogger>
, injected via:
Constructor Injection: In this example, we use constructor injection by passing the
ILogger
instance (dependency) into theBank
constructor:
This means the Bank
class can work with any Logger implementation, and we can switch between different Logger types without changing the Bank class itself.
Property Injection: We also use property injection with the
setLogger
method:
This lets us change the logger at runtime, adding more flexibility. Property injection is useful if you need to change dependencies after the object is created.
Benefits of Dependency Injection:
Flexibility: The
Bank
class works with anyLogger
implementation. We can easily add a newDatabaseLogger
in the future and use it withBank
.Testability: Dependency injection allows us to pass mock loggers to
Bank
during testing, making it easier to isolate and test its behavior.Code Reuse and Decoupling:
Bank
is not tightly coupled to specific logger implementations. It depends only on theILogger
interface, allowing us to reuse theBank
class in different contexts.
Polymorphism in Action
By defining a ILogger
interface with logTransfer
as a virtual function, both ConsoleLogger
and FileLogger
provide their own implementations. The Bank
class does not need to know which specific ILogger
it’s using—it just calls logTransfer
, and the correct function is called based on the actual ILogger
type injected. This is polymorphism in action: different behaviors (ConsoleLogger
and FileLogger
) are accessed through a common interface (ILogger
).
Simple diagram for Dependency Injection in Bank class.
This diagram highlights where dependency injection occurs, showing Bank relying on the ILogger
interface rather than specific implementations, thus enabling runtime polymorphism.
Reversing C++ Virtual Functions
Before delving into reversing C++ virtual functions, it is essential to outline some fundamental C++ concepts relevant to the task. While the benefits of Object-Oriented Programming (OOP) for programmers are undeniable, it’s worth considering whether these advantages extend to reverse engineers analyzing applications.
Advanced OOP features like polymorphism and dynamic binding are commonly used, and it's important to note that resources can be initialized on the stack or allocated on the heap.
At the assembly level, structs and classes are equivalent in terms of memory footprint, and fundamentally, both are collections of memory addresses corresponding to various types.
Class Constructor
The constructor is a crucial concept for understanding the OOP structure in C++. Let’s examine how constructors are declared and defined.
In the example below, we utilize dynamic initialization, contrasting with the previous straightforward class example.
Before invoking the constructor, the heap-allocated memory space designated to hold our class pointer is assigned to the
rcx
register, followed by a call to the parameterized constructor of the Material class, which acts as a subroutine.
The this
pointer, which is the hidden pointer of our Material object, is used in the constructor through the rcx
register. The parameters we pass in the initialize phase are moved to the memory space allocated for our object according to their size.
Copy Constructor
Both the Constructor and the Copy constructor allocates an object.
Copy Constructor receives an object pointer as parameter:
The Copy Constructor
Person::Person(Person &)
is called with two key arguments:rdx holds the address of the old object (t), which is being copied.
rcx holds the address of the location (this) where the new object will be created.
Stepping into the Copy Constructor :
Object's methods
Those getters method under the hood they're passing the hidden this pointer via rcx
register, which indicate the use of OOP primitive.
Inheritance
Inheritance is a fundamental concept that defines the inter-class relationships and extension structure in Object-Oriented Programming (OOP). It is essential to understand the distinction between the base class and the derived class, as these two concepts play a critical role in how classes interact and extend functionality.
Single inheritance
In this situation derived class has only one base class.
let's disassemble the code to understand to flow of constructor calls for each base class and the derived class, here's a detailed breakdown of what's happening in the given assembly instructions :
lea rcx, [rbp+110h_var_108]
The
lea
(Load Affective Address) instruction loads the address of the local variable intorcx
register. In this case, it's loading the address of the object that is being constructed.rbp+110h_var_108
indicates the location of thethis
pointer, which refers to the current object on the stack.We're aware that both derived objects are created on the stack, then we're calling their constructors respectively
Tree()
andFruit()
.
call Tree:Tree(void)
This line calls the constructor of the
Tree
class (Tree::Tree()
).The
rcx
register holds thethis
pointer, which is passed to the constructor to initialize theTree
object
Stepping into one of these constructors, we're noticing the this
pointer is used to call the Base constructor:
Always the constructor of the base class is called first then after followed by the constructor of the derived class.
Multiple inheritance
Derived class has more that one base class.
Again the Tree object is created on the stack, and we're using rcx
to save the address of the object into the this
pointer :
Stepping into the Tree()
constructor :
The constructor calls always start with the base constructors from left to right, followed by the derived class constructor.
In multiple inheritance, we can see the value of the private members correspondent respectively :
Plant Base class attributes.
Forrest Base class attributes.
Tree Derived class attributes.
Polymorphism
Code example
After disassembling the above code, especially in a first step we're targeting the pseudocode of the derived class constructor :
The
this
pointer is obtained from the memory allocated by the C++new
operator. The instructionlea rcx, const Tree::`vftable`
highlights the setup for virtual functions, supporting the runtime polymorphism. Unlike standard inheritance, the object's memory layout includes avftable
pointer as the first 8 bytes , directing to a table that holds pointers to overridden virtual functions.
Also the base class has its own vftable
During debugging, I noticed that the initial 8 bytes of the object held the address of the base class's vftable
during the execution of the base class constructor. When control returned to the derived class constructor, these 8 bytes were updated to point to the vftable
of the derived class, reflecting the correct virtual function overrides and Tree::Create()
will be called
The screenshot below demonstrates the concrete usage of the vftable
. Initially, the pointer to the vftbale
is loaded into rax
register from memory ( mov rax, qword ptr[plt]
). The next instruction dereferences this pointer to fetch the vftable
itself. The address of the function Tree::Create()
is then accessed via the vftable
with an offset of 0x0
, which points to the create()
method. Finally the call
function is used to invoke Tree:Create(
)
method indirectly through the vftable
, illustrating the dynamic dispatch mechanism at runtime.
Devirtualize a virtual function call
The basics
Let's start by looking how the compiler implements virtual functions. Suppose we have the following polymorphism implementation:
With the following main function:
We're aware that m
value depends on the rand()
which determined until runtime phase. The compiler cannot know know this ahead of time, so how does it call the right function ?
The answer is that for each type having a virtual function, the compiler inserts a table of functions pointers called vftable
into the resulting binary.
Each instance of such a type is given an additional member called vptr
that points to the correct vftable
for that object. Code to initialize this pointer with the right value will be added to the constructor. When the program want to call a virtual function, it can just access the correct entry in the vftable
for the object and call it.
The entries in the table must be in the same order for each related type.
We would expect to find three tables in the binary for Mammal, Cat, and Dog. We can locate them quickly by looking through .rdata
section:
Decompiling main()
The main function it decompiles to:
To make reversing more realistic I disassemble it without symbols, so renamed variable based on the current case we're studding:
🔧Memory Allocation:
In both branches of the if/else
statement, 8 bytes of memory are being allocated using new operator.
This allocation size of 8 bytes is consistent with the size of a single pointer on a 64-bit system, which matches the size of a virtual pointer vptr
🔧Virtual Pointer
When an object of a class with virtual functions is created, the compiler inserts a hidden member in the object, the virtual pointer (vptr
).
This vptr
points to the virtual function table vftable
of the object, which holds the addresses of the virtual functions of that class.
🔧Object construction
After allocating memory, the constructors Cat::Cat(ptr_this)
and Dog::Dog(ptr_this)
are called, initializing the object.
These constructors set up the vptr
to point to the appropriate vftable
for Car
or Dog
.
We can see the virtual function calls on lines 24&26. In the first, the compiler is dereferencing (to get the pvft
) and adding 16 bytes to access the 3rd entry in the vftable
. Line 26 get the 1st entry in the table which is most of the cases the Destructor.
Looking at the tables, the 3rd entries for each class are
j__purecall (Mammal the abstract class)
sub_140011005 (Cat derived class)
sub_14001112C (Dog derived class)
There are 4 entries in each vtable
:
Destructor
run
walk
move
Notice that because neither Car nor Dog implemented move()
, the both inherited the definition from Mammal
and so the move
entries in their vftable
are the same.
Create Structures
To declare the functions inside a structure X as function pointers in IDA, you should understand the signature :
Calling convention
Return type
parameters types
Once you have this information, you can correctly define the function pointers in your struct.
At this point is useful to start defining some structures. We've already seen that the only member of the Mammal, Cat, Dog structures will be their vptrs
.
Also we should create structure for each vftable
, the objective here is to get the decompile output to show us what function would actually be called if m
had a particular type. We can then cycle through these possibilities and examine all of the options:
As I mentioned previously we should set a the right signature for each virtual function declared within its structure:
If we go back to the decompiled code for main, we can now rename the local variable to m
, and set its type to be Cat*
or Dog*
We could set m
to beMammal*
, but we will see some problems if we do that :
Notice if the type of m
was *Mammal
then the call at line 24 would be to a pure virtual function. This should never happen !!
The dynamic type will be Cat
or Dog
, and we know which functions will be called in either case by looking at their vftable
entries.
Conclusion
Polymorphism is a cornerstone of C++ that significantly contributes to the implementation of COM in Windows. Therefore, understanding this feature from both a programming and a reverse engineering perspective is crucial for comprehending the underlying mechanics. I believe that before diving into security aspects, it is essential to acquire fundamental knowledge, as these foundational concepts will guide you in the process of identifying bugs.
Final Note: I am not an expert in C++ reverse engineering or programming; I am merely a learner. If you notice any inaccuracies in my statements, please feel free to reach out and correct me—I would greatly appreciate it. Thank you very much for taking the time to read this post!
References
Last updated