Polymorphism and Virtual Function Reversal in C++

Introduction

While exploring the internals of COM objects and their underlying implementation, particularly focusing on the feature of "separating implementation from the definition of behavior," I realized that COM objects heavily rely on C++ runtime polymorphism. To gain a deeper understanding of this key C++ characteristic, I spent some time delving into it, along with conducting some reverse engineering to fully grasp the concept.

Polymorphic code refers to code that can be written once and reused with different types. In C++, polymorphism is achieved through two main approaches: compile-time polymorphism, where types are determined at compile time, and runtime polymorphism, which enables dynamic behavior during program execution. Essentially, polymorphism allows objects to behave as "another type" and exhibit multiple forms. For reverse engineers, this concept is closely tied to dynamic dispatch, where virtual methods play a crucial role. Virtual methods enable calling the appropriate function of the most derived class, even if it's overridden from a base class, providing a mechanism for dynamic dispatch that allows flexible and extensible code behavior.

Motivating Example

#include <iostream>
#include <format>

struct ConsoleLogger
{
	void logTransfer(long from, long to, double amount)
	{
		std::cout << std::format("{} -> {}: {:.2f}\n", from, to, amount);
	}
};

struct Bank
{
	void makeTransfer(long from, long to, double amount)
	{
		logger.logTransfer(from, to, amount);
	}
private:
	ConsoleLogger logger;
};

int main()
{
	Bank bank;
	bank.makeTransfer(1000, 2000, 50.09);
	bank.makeTransfer(2000, 4000, 20.00);
	return 0;
}

Suppose, you need to implement various types of logger, such as remote server logger, a local file logger, or even a logger that sends data to a printer. Additionally, the program should allow dynamic switching between these loggers at runtime. One easy approach is to use a scoped enumeration to manage and switch between these different loggers :

#include <iostream>
#include <format>
#include <stdexcept>

struct ConsoleLogger
{
	void logTransfer(long from, long to, double amount)
	{
		std::cout << std::format("[CONS] {} -> {}: {:.2f}\n",from, to, amount);
	}
};

struct FileLogger
{
	void logTransfer(long from, long to, double amount)
	{
		std::cout << std::format("[FILE] {} -> {}: {:.2f}\n", from, to, amount);
	}
};

enum class LoggerType
{
	Console,
	File
};

struct Bank
{
	Bank() : type{LoggerType::Console} {}
	void set_logger(LoggerType new_type)
	{
		type = new_type;
	}
	void makeTransfer(long from, long to, double amount)
	{
		switch (type)
		{
		case LoggerType::Console:
			conslogger.logTransfer(from, to, amount);
			break;
		case LoggerType::File:
			filelogger.logTransfer(from, to, amount);
			break;
		default:
			throw std::logic_error("Unknown Logger type encountered");
			break;
		}
		
	}
private:
	ConsoleLogger conslogger;
	FileLogger filelogger;
	LoggerType type;
};

int main()
{
	Bank bank;
	bank.makeTransfer(1000, 2000, 50.09);
	bank.set_logger(LoggerType::File);
	bank.makeTransfer(2000, 4000, 20.00);
	return 0;
}

The solution provided above still has some limitations and isn't as flexible as it could be. The primary issue here is that the Bank class directly depends on specific logger implementations (ConsoleLogger and FileLogger). This design violates the Open/Closed Principle, as adding a new logger would require modifying the Bank class, making it less scalable and maintainable.

A better approach would be to use runtime polymorphism and dependency injection by introducing a common interface (abstract class) for loggers. This way, we can pass any logger implementation to the Bank class without modifying it.

Runtime Polymorphism

Runtime polymorphism enables you to conveniently "program in general" rather than "program in specific", and is also known as dynamic polymorphism or late binding, In runtime polymorphism, the function call is resolved at run time.

To refactor the code above by utilizing polymorphism and dependency injection, we should first clarify these two concepts:

Polymorphism: This allows us to use different types of loggers (e.g., ConsoleLogger, FileLogger) through a common interface. This means our Bank class can operate with any logger that implements a common interface, without knowing the specifics of each logger.
Dependency Injection: Instead of creating dependencies directly inside the Bank class (like ConsoleLogger conslogger; and FileLogger filelogger;), dependency injection allows us to "inject" these dependencies (in this case, the logger) from outside. This makes our Bank class more flexible, allowing us to change the logger without altering the class itself. We achieve this by passing the logger instance to Bank, either in the constructor or via a setter method.

Let's apply these concepts to the code.

Step 1: Create a `ILogger` Interface

We'll define an abstract base class called ILogger with a pure virtual logTransfer method. Both the concrete classes ConsoleLogger and FileLogger will inherit from this interface and provide their own implementations.

Step 2: Use Dependency Injection in the `Bank` Class

Instead of hard-coding the ConsoleLogger or FileLogger objects in Bank, we’ll inject a Logger pointer (or reference) into Bank. This allows the Bank to work with any Logger-derived object.

#include <iostream>
#include <format>
#include <memory>
#include <stdexcept>

// Step 1: Create the ILogger interface
struct ILogger
{
    virtual void logTransfer(long from, long to, double amount) = 0; // pure virtual function
    virtual ~ILogger() = default; // virtual destructor for proper cleanup of derived classes
};

// ConsoleLogger inherits from ILogger
struct ConsoleLogger : public ILogger
{
    void logTransfer(long from, long to, double amount) override
    {
        std::cout << std::format("[CONS] {} -> {}: {:.2f}\n", from, to, amount);
    }
};

// FileLogger inherits from ILogger
struct FileLogger : public ILogger
{
    void logTransfer(long from, long to, double amount) override
    {
        std::cout << std::format("[FILE] {} -> {}: {:.2f}\n", from, to, amount);
    }
};

// Bank class now depends on the ILogger interface, not a specific implementation
struct Bank
{
    // Constructor takes a Logger pointer, which allows dependency injection
    Bank(std::shared_ptr<ILogger> logger) : m_logger(std::move(logger)) {}

    // Setter to change the logger at runtime if needed
    void setLogger(std::shared_ptr<ILogger> new_logger)
    {
        m_logger = std::move(new_logger);
    }

    void makeTransfer(long from, long to, double amount)
    {
        if (!m_logger)
        {
            throw std::logic_error("Logger is not set!");
        }
        m_logger->logTransfer(from, to, amount);
    }

private:
    std::shared_ptr<ILogger> m_logger; // Pointer to a ILogger, allows polymorphic behavior
};

// Main function to demonstrate dependency injection
int main()
{
    // Inject a ConsoleLogger into Bank
    auto consoleLogger = std::make_shared<ConsoleLogger>();
    Bank bank(consoleLogger); // Dependency injection via constructor

    bank.makeTransfer(1000, 2000, 50.09);

    // Switch to a FileLogger at runtime
    auto fileLogger = std::make_shared<FileLogger>();
    bank.setLogger(fileLogger); // Dependency injection via setter

    bank.makeTransfer(2000, 4000, 20.00);

    return 0;
}

Dependency Injection

What is Dependency Injection? Dependency Injection is a design pattern that helps you pass dependencies (objects a class needs to function) to a class instead of creating them inside the class. This makes your code more flexible and testable.

The Bank class has a private logger attribute of type std::shared_ptr<ILogger>, injected via:

Constructor Injection: In this example, we use constructor injection by passing the ILogger instance (dependency) into the Bank constructor:

Bank bank(consoleLogger);

This means the Bank class can work with any Logger implementation, and we can switch between different Logger types without changing the Bank class itself.

Property Injection: We also use property injection with the setLogger method:

bank.setLogger(fileLogger);

This lets us change the logger at runtime, adding more flexibility. Property injection is useful if you need to change dependencies after the object is created.

Benefits of Dependency Injection:

Flexibility: The Bank class works with any Logger implementation. We can easily add a new DatabaseLogger in the future and use it with Bank.
Testability: Dependency injection allows us to pass mock loggers to Bank during testing, making it easier to isolate and test its behavior.
Code Reuse and Decoupling: Bank is not tightly coupled to specific logger implementations. It depends only on the ILogger interface, allowing us to reuse the Bank class in different contexts.

Polymorphism in Action

By defining a ILogger interface with logTransfer as a virtual function, both ConsoleLogger and FileLogger provide their own implementations. The Bank class does not need to know which specific ILogger it’s using—it just calls logTransfer, and the correct function is called based on the actual ILogger type injected. This is polymorphism in action: different behaviors (ConsoleLogger and FileLogger) are accessed through a common interface (ILogger).

Simple diagram for Dependency Injection in Bank class.

          +--------------------+
          |      Logger        |  <--- Interface
          |--------------------|
          | + logTransfer(...) |
          +--------------------+
                    ▲
                    |
         +----------+----------+
         |                     |
+-----------------+   +------------------+
|  ConsoleLogger  |   |   FileLogger     |  <--- Concrete Implementations
|-----------------|   |------------------|
| + logTransfer() |   | + logTransfer()  |
+-----------------+   +------------------+

                    ▲
                    |
               +------------+
               |    Bank    |
               |------------|
               | - logger   |   <--- Logger dependency (Injected)
               | + setLogger()     (Setter Injection)
               | + makeTransfer()  (uses logger)
               +------------+

This diagram highlights where dependency injection occurs, showing Bank relying on the ILogger interface rather than specific implementations, thus enabling runtime polymorphism.

Reversing C++ Virtual Functions

Before delving into reversing C++ virtual functions, it is essential to outline some fundamental C++ concepts relevant to the task. While the benefits of Object-Oriented Programming (OOP) for programmers are undeniable, it’s worth considering whether these advantages extend to reverse engineers analyzing applications.

Advanced OOP features like polymorphism and dynamic binding are commonly used, and it's important to note that resources can be initialized on the stack or allocated on the heap.

At the assembly level, structs and classes are equivalent in terms of memory footprint, and fundamentally, both are collections of memory addresses corresponding to various types.

Class Constructor

The constructor is a crucial concept for understanding the OOP structure in C++. Let’s examine how constructors are declared and defined.

In the example below, we utilize dynamic initialization, contrasting with the previous straightforward class example.
Before invoking the constructor, the heap-allocated memory space designated to hold our class pointer is assigned to the rcx register, followed by a call to the parameterized constructor of the Material class, which acts as a subroutine.

The thispointer, which is the hidden pointer of our Material object, is used in the constructor through the rcx register. The parameters we pass in the initialize phase are moved to the memory space allocated for our object according to their size.

Copy Constructor

Both the Constructor and the Copy constructor allocates an object.

Copy Constructor receives an object pointer as parameter:

The Copy Constructor Person::Person(Person &) is called with two key arguments:
rdx holds the address of the old object (t), which is being copied.
rcx holds the address of the location (this) where the new object will be created.

Stepping into the Copy Constructor :

Object's methods

Those getters method under the hood they're passing the hidden this pointer via rcx register, which indicate the use of OOP primitive.

Inheritance

Inheritance is a fundamental concept that defines the inter-class relationships and extension structure in Object-Oriented Programming (OOP). It is essential to understand the distinction between the base class and the derived class, as these two concepts play a critical role in how classes interact and extend functionality.

Single inheritance

In this situation derived class has only one base class.

#include <iostream>

class Plant
{
public:
	Plant() : m_age{ 0 } 
	{
		std::cout << "Call Plant() Base Constructor\n";
	};
	
private:
	int m_age{};
};

class Tree : public Plant
{
public:
	Tree() :m_leafcount{ 0 } {
	
		std::cout << "Call Tree() Derived Constructor\n";
	};
private:
	int m_leafcount{};
};

class Fruit : public Plant
{
public:
	Fruit() :m_waterpercent{ 0 } {
		std::cout << "Call Fruit() Derived Constructor\n";
	};
private:
	int m_waterpercent{};
};

int main()
{
	Tree oak;
	Fruit apple;
	return 0;
}

let's disassemble the code to understand to flow of constructor calls for each base class and the derived class, here's a detailed breakdown of what's happening in the given assembly instructions :

lea rcx, [rbp+110h_var_108]
- The lea (Load Affective Address) instruction loads the address of the local variable into rcx register. In this case, it's loading the address of the object that is being constructed.
- rbp+110h_var_108 indicates the location of the this pointer, which refers to the current object on the stack.
- We're aware that both derived objects are created on the stack, then we're calling their constructors respectively Tree() and Fruit().
call Tree:Tree(void)
- This line calls the constructor of the Tree class (Tree::Tree()).
- The rcx register holds the this pointer, which is passed to the constructor to initialize the Tree object

Stepping into one of these constructors, we're noticing the this pointer is used to call the Base constructor:

Always the constructor of the base class is called first then after followed by the constructor of the derived class.

Multiple inheritance

Derived class has more that one base class.

#include <iostream>

class Plant
{
public:
	Plant() : m_age{ 0 } { std::cout << "Plant::Plant()\n";}
private:
	int m_age{};
};

class Forest
{
public:
	Forest() :numof_trees{} { std::cout << "Forest::Forest()\n"; }
private:
	int numof_trees{};
};


class Tree : public Plant, Forest
{
public:
	Tree() :leaf_count{ 0 } { std::cout << "Tree::Tree()\n"; }
private:
	int leaf_count{};
};



int main()
{
	Tree tr;
	return 0;
}

Again the Tree object is created on the stack, and we're using rcx to save the address of the object into the this pointer :

Stepping into the Tree() constructor :

The constructor calls always start with the base constructors from left to right, followed by the derived class constructor.

In multiple inheritance, we can see the value of the private members correspondent respectively :

Plant Base class attributes.
Forrest Base class attributes.
Tree Derived class attributes.

Polymorphism

Code example

#include <iostream>

class Plant
{
public:
	Plant() : m_age{ 0 } { std::cout << "Plant::Plant()\n";}
	virtual void Create() { std::cout << "New Plant type created\n"; }
	void del() { std::cout << "Plant type deleted!\n"; }
private:
	int m_age{};
};


class Tree final : public Plant
{
public:
	Tree() :leaf_count{ 0 } { std::cout << "Tree::Tree()\n"; }
	void Create() override { std::cout << "New Tree type created\n"; }
	void del() { std::cout << "Tree type deleted!\n"; }
private:
	int leaf_count{};
};

int main() {
	Tree* oak = new Tree;
	Plant* plt{ oak };

	plt->Create();
	plt->del();

	return 0;
}

After disassembling the above code, especially in a first step we're targeting the pseudocode of the derived class constructor :

The this pointer is obtained from the memory allocated by the C++ new operator. The instruction lea rcx, const Tree::`vftable` highlights the setup for virtual functions, supporting the runtime polymorphism. Unlike standard inheritance, the object's memory layout includes a vftable pointer as the first 8 bytes , directing to a table that holds pointers to overridden virtual functions.

Also the base class has its own vftable

During debugging, I noticed that the initial 8 bytes of the object held the address of the base class's vftable during the execution of the base class constructor. When control returned to the derived class constructor, these 8 bytes were updated to point to the vftable of the derived class, reflecting the correct virtual function overrides and Tree::Create() will be called

The screenshot below demonstrates the concrete usage of the vftable. Initially, the pointer to the vftbale is loaded into rax register from memory ( mov rax, qword ptr[plt]). The next instruction dereferences this pointer to fetch the vftable itself. The address of the function Tree::Create() is then accessed via the vftable with an offset of 0x0, which points to the create() method. Finally the call function is used to invoke Tree:Create() method indirectly through the vftable, illustrating the dynamic dispatch mechanism at runtime.

Devirtualize a virtual function call

The basics

Let's start by looking how the compiler implements virtual functions. Suppose we have the following polymorphism implementation:

#pragma once
#include <iostream>

struct Mammal
{
	Mammal() { std::cout << "Mammal::Mammal()\n"; }
	virtual ~Mammal() { std::cout << "Mammal::~Mammal()\n"; }

	virtual void run() = 0;
	virtual void walk() = 0;
	virtual void move() { walk(); }
};

struct Cat : Mammal
{
	Cat() { std::cout << "Cat::Cat()\n"; }
	virtual ~Cat() { std::cout << "Cat::~Cat()\n"; }

	void run() override { std::cout << "Cat::run()\n"; }
	void walk() override { std::cout << "Cat::walk()\n"; }
};

struct Dog : Mammal
{
	Dog() { std::cout << "Dog::Dog()\n"; }
	virtual ~Dog() { std::cout << "Dog::~Dog()\n"; }

	void run() override { std::cout << "Dog::run()\n"; }
	void walk() override { std::cout << "Dog::walk()\n"; }
};

With the following main function:

#include <iostream>
#include <cstdlib>
#include "reversing_1.h"

int main()
{
	Mammal* m;
	if (rand() % 2)
	{
		m = new Cat();
	}
	else
	{
		m = new Dog();
	}
	m->walk();

	delete m;
	return 0;
}

We're aware that m value depends on the rand() which determined until runtime phase. The compiler cannot know know this ahead of time, so how does it call the right function ?

The answer is that for each type having a virtual function, the compiler inserts a table of functions pointers called vftable into the resulting binary.

Each instance of such a type is given an additional member called vptr that points to the correct vftable for that object. Code to initialize this pointer with the right value will be added to the constructor. When the program want to call a virtual function, it can just access the correct entry in the vftable for the object and call it.

The entries in the table must be in the same order for each related type.

We would expect to find three tables in the binary for Mammal, Cat, and Dog. We can locate them quickly by looking through .rdata section:

Decompiling main()

The main function it decompiles to:

To make reversing more realistic I disassemble it without symbols, so renamed variable based on the current case we're studding:

🔧Memory Allocation:

In both branches of the if/else statement, 8 bytes of memory are being allocated using new operator.

This allocation size of 8 bytes is consistent with the size of a single pointer on a 64-bit system, which matches the size of a virtual pointer vptr

🔧Virtual Pointer

When an object of a class with virtual functions is created, the compiler inserts a hidden member in the object, the virtual pointer (vptr).

This vptr points to the virtual function table vftable of the object, which holds the addresses of the virtual functions of that class.

🔧Object construction

After allocating memory, the constructors Cat::Cat(ptr_this) and Dog::Dog(ptr_this) are called, initializing the object.

These constructors set up the vptr to point to the appropriate vftable for Car or Dog.

We can see the virtual function calls on lines 24&26. In the first, the compiler is dereferencing (to get the pvft) and adding 16 bytes to access the 3rd entry in the vftable. Line 26 get the 1st entry in the table which is most of the cases the Destructor.

Looking at the tables, the 3rd entries for each class are

j__purecall (Mammal the abstract class)
sub_140011005 (Cat derived class)

sub_14001112C (Dog derived class)

There are 4 entries in each vtable :

Destructor
run
walk
move

Notice that because neither Car nor Dog implemented move(), the both inherited the definition from Mammal and so the move entries in their vftable are the same.

Create Structures

To declare the functions inside a structure X as function pointers in IDA, you should understand the signature :

Calling convention
Return type
parameters types

Once you have this information, you can correctly define the function pointers in your struct.

At this point is useful to start defining some structures. We've already seen that the only member of the Mammal, Cat, Dog structures will be their vptrs.

Also we should create structure for each vftable, the objective here is to get the decompile output to show us what function would actually be called if m had a particular type. We can then cycle through these possibilities and examine all of the options:

As I mentioned previously we should set a the right signature for each virtual function declared within its structure:

struct Catvftable

{

void (__thiscall *Cat_desctructor)(void *);

void (__thiscall *Cat_run)(void *);

void (__thiscall *Cat_walk)(void *);

void (__thiscall *Mammal_move)(void *);

};

If we go back to the decompiled code for main, we can now rename the local variable to m, and set its type to be Cat* or Dog*

We could set m to beMammal*, but we will see some problems if we do that :

Notice if the type of m was *Mammal then the call at line 24 would be to a pure virtual function. This should never happen !!

The dynamic type will be Cat or Dog, and we know which functions will be called in either case by looking at their vftable entries.

Conclusion

Polymorphism is a cornerstone of C++ that significantly contributes to the implementation of COM in Windows. Therefore, understanding this feature from both a programming and a reverse engineering perspective is crucial for comprehending the underlying mechanics. I believe that before diving into security aspects, it is essential to acquire fundamental knowledge, as these foundational concepts will guide you in the process of identifying bugs.

Final Note: I am not an expert in C++ reverse engineering or programming; I am merely a learner. If you notice any inaccuracies in my statements, please feel free to reach out and correct me—I would greatly appreciate it. Thank you very much for taking the time to read this post!

References

PreviousDigging into Windows PEB NextLeveraging from PE parsing technique to write x86 shellcode

Last updated 3 months ago

Was this helpful?

Polymorphism and Virtual Function Reversal in C++

Introduction

Motivating Example

#include <iostream>
#include <format>

struct ConsoleLogger
{
	void logTransfer(long from, long to, double amount)
	{
		std::cout << std::format("{} -> {}: {:.2f}\n", from, to, amount);
	}
};

struct Bank
{
	void makeTransfer(long from, long to, double amount)
	{
		logger.logTransfer(from, to, amount);
	}
private:
	ConsoleLogger logger;
};

int main()
{
	Bank bank;
	bank.makeTransfer(1000, 2000, 50.09);
	bank.makeTransfer(2000, 4000, 20.00);
	return 0;
}

#include <iostream>
#include <format>
#include <stdexcept>

struct ConsoleLogger
{
	void logTransfer(long from, long to, double amount)
	{
		std::cout << std::format("[CONS] {} -> {}: {:.2f}\n",from, to, amount);
	}
};

struct FileLogger
{
	void logTransfer(long from, long to, double amount)
	{
		std::cout << std::format("[FILE] {} -> {}: {:.2f}\n", from, to, amount);
	}
};

enum class LoggerType
{
	Console,
	File
};

struct Bank
{
	Bank() : type{LoggerType::Console} {}
	void set_logger(LoggerType new_type)
	{
		type = new_type;
	}
	void makeTransfer(long from, long to, double amount)
	{
		switch (type)
		{
		case LoggerType::Console:
			conslogger.logTransfer(from, to, amount);
			break;
		case LoggerType::File:
			filelogger.logTransfer(from, to, amount);
			break;
		default:
			throw std::logic_error("Unknown Logger type encountered");
			break;
		}
		
	}
private:
	ConsoleLogger conslogger;
	FileLogger filelogger;
	LoggerType type;
};

int main()
{
	Bank bank;
	bank.makeTransfer(1000, 2000, 50.09);
	bank.set_logger(LoggerType::File);
	bank.makeTransfer(2000, 4000, 20.00);
	return 0;
}

Runtime Polymorphism

To refactor the code above by utilizing polymorphism and dependency injection, we should first clarify these two concepts:

Polymorphism: This allows us to use different types of loggers (e.g., ConsoleLogger, FileLogger) through a common interface. This means our Bank class can operate with any logger that implements a common interface, without knowing the specifics of each logger.
Dependency Injection: Instead of creating dependencies directly inside the Bank class (like ConsoleLogger conslogger; and FileLogger filelogger;), dependency injection allows us to "inject" these dependencies (in this case, the logger) from outside. This makes our Bank class more flexible, allowing us to change the logger without altering the class itself. We achieve this by passing the logger instance to Bank, either in the constructor or via a setter method.

Let's apply these concepts to the code.

Step 1: Create a `ILogger` Interface

Step 2: Use Dependency Injection in the `Bank` Class

#include <iostream>
#include <format>
#include <memory>
#include <stdexcept>

// Step 1: Create the ILogger interface
struct ILogger
{
    virtual void logTransfer(long from, long to, double amount) = 0; // pure virtual function
    virtual ~ILogger() = default; // virtual destructor for proper cleanup of derived classes
};

// ConsoleLogger inherits from ILogger
struct ConsoleLogger : public ILogger
{
    void logTransfer(long from, long to, double amount) override
    {
        std::cout << std::format("[CONS] {} -> {}: {:.2f}\n", from, to, amount);
    }
};

// FileLogger inherits from ILogger
struct FileLogger : public ILogger
{
    void logTransfer(long from, long to, double amount) override
    {
        std::cout << std::format("[FILE] {} -> {}: {:.2f}\n", from, to, amount);
    }
};

// Bank class now depends on the ILogger interface, not a specific implementation
struct Bank
{
    // Constructor takes a Logger pointer, which allows dependency injection
    Bank(std::shared_ptr<ILogger> logger) : m_logger(std::move(logger)) {}

    // Setter to change the logger at runtime if needed
    void setLogger(std::shared_ptr<ILogger> new_logger)
    {
        m_logger = std::move(new_logger);
    }

    void makeTransfer(long from, long to, double amount)
    {
        if (!m_logger)
        {
            throw std::logic_error("Logger is not set!");
        }
        m_logger->logTransfer(from, to, amount);
    }

private:
    std::shared_ptr<ILogger> m_logger; // Pointer to a ILogger, allows polymorphic behavior
};

// Main function to demonstrate dependency injection
int main()
{
    // Inject a ConsoleLogger into Bank
    auto consoleLogger = std::make_shared<ConsoleLogger>();
    Bank bank(consoleLogger); // Dependency injection via constructor

    bank.makeTransfer(1000, 2000, 50.09);

    // Switch to a FileLogger at runtime
    auto fileLogger = std::make_shared<FileLogger>();
    bank.setLogger(fileLogger); // Dependency injection via setter

    bank.makeTransfer(2000, 4000, 20.00);

    return 0;
}

Dependency Injection

What is Dependency Injection? Dependency Injection is a design pattern that helps you pass dependencies (objects a class needs to function) to a class instead of creating them inside the class. This makes your code more flexible and testable.

The Bank class has a private logger attribute of type std::shared_ptr<ILogger>, injected via:

Constructor Injection: In this example, we use constructor injection by passing the ILogger instance (dependency) into the Bank constructor:

Bank bank(consoleLogger);

This means the Bank class can work with any Logger implementation, and we can switch between different Logger types without changing the Bank class itself.

Property Injection: We also use property injection with the setLogger method:

bank.setLogger(fileLogger);

This lets us change the logger at runtime, adding more flexibility. Property injection is useful if you need to change dependencies after the object is created.

Benefits of Dependency Injection:

Flexibility: The Bank class works with any Logger implementation. We can easily add a new DatabaseLogger in the future and use it with Bank.
Testability: Dependency injection allows us to pass mock loggers to Bank during testing, making it easier to isolate and test its behavior.
Code Reuse and Decoupling: Bank is not tightly coupled to specific logger implementations. It depends only on the ILogger interface, allowing us to reuse the Bank class in different contexts.

Polymorphism in Action

Simple diagram for Dependency Injection in Bank class.

          +--------------------+
          |      Logger        |  <--- Interface
          |--------------------|
          | + logTransfer(...) |
          +--------------------+
                    ▲
                    |
         +----------+----------+
         |                     |
+-----------------+   +------------------+
|  ConsoleLogger  |   |   FileLogger     |  <--- Concrete Implementations
|-----------------|   |------------------|
| + logTransfer() |   | + logTransfer()  |
+-----------------+   +------------------+

                    ▲
                    |
               +------------+
               |    Bank    |
               |------------|
               | - logger   |   <--- Logger dependency (Injected)
               | + setLogger()     (Setter Injection)
               | + makeTransfer()  (uses logger)
               +------------+

This diagram highlights where dependency injection occurs, showing Bank relying on the ILogger interface rather than specific implementations, thus enabling runtime polymorphism.

Reversing C++ Virtual Functions

Advanced OOP features like polymorphism and dynamic binding are commonly used, and it's important to note that resources can be initialized on the stack or allocated on the heap.

At the assembly level, structs and classes are equivalent in terms of memory footprint, and fundamentally, both are collections of memory addresses corresponding to various types.

Class Constructor

The constructor is a crucial concept for understanding the OOP structure in C++. Let’s examine how constructors are declared and defined.

In the example below, we utilize dynamic initialization, contrasting with the previous straightforward class example.
Before invoking the constructor, the heap-allocated memory space designated to hold our class pointer is assigned to the rcx register, followed by a call to the parameterized constructor of the Material class, which acts as a subroutine.

Copy Constructor

Both the Constructor and the Copy constructor allocates an object.

Copy Constructor receives an object pointer as parameter:

The Copy Constructor Person::Person(Person &) is called with two key arguments:
rdx holds the address of the old object (t), which is being copied.
rcx holds the address of the location (this) where the new object will be created.

Stepping into the Copy Constructor :

Object's methods

Those getters method under the hood they're passing the hidden this pointer via rcx register, which indicate the use of OOP primitive.

Inheritance

Single inheritance

In this situation derived class has only one base class.

#include <iostream>

class Plant
{
public:
	Plant() : m_age{ 0 } 
	{
		std::cout << "Call Plant() Base Constructor\n";
	};
	
private:
	int m_age{};
};

class Tree : public Plant
{
public:
	Tree() :m_leafcount{ 0 } {
	
		std::cout << "Call Tree() Derived Constructor\n";
	};
private:
	int m_leafcount{};
};

class Fruit : public Plant
{
public:
	Fruit() :m_waterpercent{ 0 } {
		std::cout << "Call Fruit() Derived Constructor\n";
	};
private:
	int m_waterpercent{};
};

int main()
{
	Tree oak;
	Fruit apple;
	return 0;
}

let's disassemble the code to understand to flow of constructor calls for each base class and the derived class, here's a detailed breakdown of what's happening in the given assembly instructions :

lea rcx, [rbp+110h_var_108]
- The lea (Load Affective Address) instruction loads the address of the local variable into rcx register. In this case, it's loading the address of the object that is being constructed.
- rbp+110h_var_108 indicates the location of the this pointer, which refers to the current object on the stack.
- We're aware that both derived objects are created on the stack, then we're calling their constructors respectively Tree() and Fruit().
call Tree:Tree(void)
- This line calls the constructor of the Tree class (Tree::Tree()).
- The rcx register holds the this pointer, which is passed to the constructor to initialize the Tree object

Stepping into one of these constructors, we're noticing the this pointer is used to call the Base constructor:

Always the constructor of the base class is called first then after followed by the constructor of the derived class.

Multiple inheritance

Derived class has more that one base class.

#include <iostream>

class Plant
{
public:
	Plant() : m_age{ 0 } { std::cout << "Plant::Plant()\n";}
private:
	int m_age{};
};

class Forest
{
public:
	Forest() :numof_trees{} { std::cout << "Forest::Forest()\n"; }
private:
	int numof_trees{};
};


class Tree : public Plant, Forest
{
public:
	Tree() :leaf_count{ 0 } { std::cout << "Tree::Tree()\n"; }
private:
	int leaf_count{};
};



int main()
{
	Tree tr;
	return 0;
}

Again the Tree object is created on the stack, and we're using rcx to save the address of the object into the this pointer :

Stepping into the Tree() constructor :

The constructor calls always start with the base constructors from left to right, followed by the derived class constructor.

In multiple inheritance, we can see the value of the private members correspondent respectively :

Plant Base class attributes.
Forrest Base class attributes.
Tree Derived class attributes.

Polymorphism

Code example

#include <iostream>

class Plant
{
public:
	Plant() : m_age{ 0 } { std::cout << "Plant::Plant()\n";}
	virtual void Create() { std::cout << "New Plant type created\n"; }
	void del() { std::cout << "Plant type deleted!\n"; }
private:
	int m_age{};
};


class Tree final : public Plant
{
public:
	Tree() :leaf_count{ 0 } { std::cout << "Tree::Tree()\n"; }
	void Create() override { std::cout << "New Tree type created\n"; }
	void del() { std::cout << "Tree type deleted!\n"; }
private:
	int leaf_count{};
};

int main() {
	Tree* oak = new Tree;
	Plant* plt{ oak };

	plt->Create();
	plt->del();

	return 0;
}

After disassembling the above code, especially in a first step we're targeting the pseudocode of the derived class constructor :

The this pointer is obtained from the memory allocated by the C++ new operator. The instruction lea rcx, const Tree::`vftable` highlights the setup for virtual functions, supporting the runtime polymorphism. Unlike standard inheritance, the object's memory layout includes a vftable pointer as the first 8 bytes , directing to a table that holds pointers to overridden virtual functions.

Also the base class has its own vftable

Devirtualize a virtual function call

The basics

Let's start by looking how the compiler implements virtual functions. Suppose we have the following polymorphism implementation:

#pragma once
#include <iostream>

struct Mammal
{
	Mammal() { std::cout << "Mammal::Mammal()\n"; }
	virtual ~Mammal() { std::cout << "Mammal::~Mammal()\n"; }

	virtual void run() = 0;
	virtual void walk() = 0;
	virtual void move() { walk(); }
};

struct Cat : Mammal
{
	Cat() { std::cout << "Cat::Cat()\n"; }
	virtual ~Cat() { std::cout << "Cat::~Cat()\n"; }

	void run() override { std::cout << "Cat::run()\n"; }
	void walk() override { std::cout << "Cat::walk()\n"; }
};

struct Dog : Mammal
{
	Dog() { std::cout << "Dog::Dog()\n"; }
	virtual ~Dog() { std::cout << "Dog::~Dog()\n"; }

	void run() override { std::cout << "Dog::run()\n"; }
	void walk() override { std::cout << "Dog::walk()\n"; }
};

With the following main function:

#include <iostream>
#include <cstdlib>
#include "reversing_1.h"

int main()
{
	Mammal* m;
	if (rand() % 2)
	{
		m = new Cat();
	}
	else
	{
		m = new Dog();
	}
	m->walk();

	delete m;
	return 0;
}

We're aware that m value depends on the rand() which determined until runtime phase. The compiler cannot know know this ahead of time, so how does it call the right function ?

The answer is that for each type having a virtual function, the compiler inserts a table of functions pointers called vftable into the resulting binary.

The entries in the table must be in the same order for each related type.

We would expect to find three tables in the binary for Mammal, Cat, and Dog. We can locate them quickly by looking through .rdata section:

Decompiling main()

The main function it decompiles to:

To make reversing more realistic I disassemble it without symbols, so renamed variable based on the current case we're studding:

🔧Memory Allocation:

In both branches of the if/else statement, 8 bytes of memory are being allocated using new operator.

This allocation size of 8 bytes is consistent with the size of a single pointer on a 64-bit system, which matches the size of a virtual pointer vptr

🔧Virtual Pointer

When an object of a class with virtual functions is created, the compiler inserts a hidden member in the object, the virtual pointer (vptr).

This vptr points to the virtual function table vftable of the object, which holds the addresses of the virtual functions of that class.

🔧Object construction

After allocating memory, the constructors Cat::Cat(ptr_this) and Dog::Dog(ptr_this) are called, initializing the object.

These constructors set up the vptr to point to the appropriate vftable for Car or Dog.

Looking at the tables, the 3rd entries for each class are

j__purecall (Mammal the abstract class)
sub_140011005 (Cat derived class)

sub_14001112C (Dog derived class)

There are 4 entries in each vtable :

Destructor
run
walk
move

Notice that because neither Car nor Dog implemented move(), the both inherited the definition from Mammal and so the move entries in their vftable are the same.

Create Structures

To declare the functions inside a structure X as function pointers in IDA, you should understand the signature :

Calling convention
Return type
parameters types

Once you have this information, you can correctly define the function pointers in your struct.

At this point is useful to start defining some structures. We've already seen that the only member of the Mammal, Cat, Dog structures will be their vptrs.

As I mentioned previously we should set a the right signature for each virtual function declared within its structure:

struct Catvftable

{

void (__thiscall *Cat_desctructor)(void *);

void (__thiscall *Cat_run)(void *);

void (__thiscall *Cat_walk)(void *);

void (__thiscall *Mammal_move)(void *);

};