# Polymorphism and Virtual Function Reversal in C++

## Introduction

While exploring the internals of COM objects and their underlying implementation, particularly focusing on the feature of "**separating implementation from the definition of behavior**," I realized that COM objects heavily rely on C++ runtime polymorphism. To gain a deeper understanding of this key C++ characteristic, I spent some time delving into it, along with conducting some reverse engineering to fully grasp the concept.

Polymorphic code refers to code that can be written once and reused with different types. In C++, polymorphism is achieved through two main approaches: **compile-time** **polymorphism**, where types are determined at compile time, and **runtime polymorphism**, which enables dynamic behavior during program execution. Essentially, polymorphism allows objects to behave as "another type" and exhibit multiple forms. For reverse engineers, this concept is closely tied to **dynamic dispatch**, where **virtual methods** play a crucial role. **Virtual methods** enable calling the appropriate function of the most derived class, even if it's overridden from a base class, providing a mechanism for dynamic dispatch that allows flexible and extensible code behavior.

## Motivating Example

```cpp
#include <iostream>
#include <format>

struct ConsoleLogger
{
	void logTransfer(long from, long to, double amount)
	{
		std::cout << std::format("{} -> {}: {:.2f}\n", from, to, amount);
	}
};

struct Bank
{
	void makeTransfer(long from, long to, double amount)
	{
		logger.logTransfer(from, to, amount);
	}
private:
	ConsoleLogger logger;
};

int main()
{
	Bank bank;
	bank.makeTransfer(1000, 2000, 50.09);
	bank.makeTransfer(2000, 4000, 20.00);
	return 0;
}
```

Suppose, you need to implement various types of logger, such as remote server logger, a local file logger, or even a logger that sends data to a printer. Additionally, the program should allow dynamic switching between these loggers at runtime. One easy approach is to use a **scoped enumeration** to manage and switch between these different loggers :

```cpp
#include <iostream>
#include <format>
#include <stdexcept>

struct ConsoleLogger
{
	void logTransfer(long from, long to, double amount)
	{
		std::cout << std::format("[CONS] {} -> {}: {:.2f}\n",from, to, amount);
	}
};

struct FileLogger
{
	void logTransfer(long from, long to, double amount)
	{
		std::cout << std::format("[FILE] {} -> {}: {:.2f}\n", from, to, amount);
	}
};

enum class LoggerType
{
	Console,
	File
};

struct Bank
{
	Bank() : type{LoggerType::Console} {}
	void set_logger(LoggerType new_type)
	{
		type = new_type;
	}
	void makeTransfer(long from, long to, double amount)
	{
		switch (type)
		{
		case LoggerType::Console:
			conslogger.logTransfer(from, to, amount);
			break;
		case LoggerType::File:
			filelogger.logTransfer(from, to, amount);
			break;
		default:
			throw std::logic_error("Unknown Logger type encountered");
			break;
		}
		
	}
private:
	ConsoleLogger conslogger;
	FileLogger filelogger;
	LoggerType type;
};

int main()
{
	Bank bank;
	bank.makeTransfer(1000, 2000, 50.09);
	bank.set_logger(LoggerType::File);
	bank.makeTransfer(2000, 4000, 20.00);
	return 0;
}
```

The solution provided above still has some limitations and isn't as flexible as it could be. The primary issue here is that the **`Bank`** class directly depends on specific logger implementations (**`ConsoleLogger`** and **`FileLogger`**). This design violates the Open/Closed Principle, as adding a new logger would require modifying the Bank class, making it less scalable and maintainable.

A better approach would be to use **runtime polymorphism** and **dependency injection** by introducing a common interface (abstract class) for loggers. This way, we can pass any logger implementation to the **`Bank`** class without modifying it.

## Runtime Polymorphism

Runtime polymorphism enables you to conveniently "**program in general**" rather than "**program in specific**", and is also known as dynamic polymorphism or late binding, In runtime polymorphism, the function call is resolved at run time.

To refactor the code above by utilizing polymorphism and dependency injection, we should first clarify these two concepts:

* ***Polymorphism***: This allows us to use different types of loggers (e.g., **`ConsoleLogger`**, **`FileLogger`**) through a common interface. This means our `Bank` class can operate with any logger that implements a common interface, without knowing the specifics of each logger.
* ***Dependency Injection***: Instead of creating dependencies directly inside the `Bank` class (like **`ConsoleLogger conslogger`**`;` and **`FileLogger filelogger`**`;`), dependency injection allows us to "inject" these dependencies (in this case, the logger) from outside. This makes our `Bank` class more flexible, allowing us to change the logger without altering the class itself. We achieve this by passing the logger instance to `Bank`, either in the constructor or via a setter method.

Let's apply these concepts to the code.

#### Step 1: Create a `ILogger` Interface

We'll define an abstract base class called **`ILogger`** with a pure virtual **`logTransfer`** method. Both the concrete classes **`ConsoleLogger`** and **`FileLogger`** will inherit from this interface and provide their own implementations.

#### Step 2: Use Dependency Injection in the `Bank` Class

Instead of hard-coding the **`ConsoleLogger`** or **`FileLogger`** objects in **`Bank`**, we’ll inject a **`Logger`** pointer (or reference) into **`Bank`**. This allows the **`Bank`** to work with any `Logger`-derived object.

```cpp
#include <iostream>
#include <format>
#include <memory>
#include <stdexcept>

// Step 1: Create the ILogger interface
struct ILogger
{
    virtual void logTransfer(long from, long to, double amount) = 0; // pure virtual function
    virtual ~ILogger() = default; // virtual destructor for proper cleanup of derived classes
};

// ConsoleLogger inherits from ILogger
struct ConsoleLogger : public ILogger
{
    void logTransfer(long from, long to, double amount) override
    {
        std::cout << std::format("[CONS] {} -> {}: {:.2f}\n", from, to, amount);
    }
};

// FileLogger inherits from ILogger
struct FileLogger : public ILogger
{
    void logTransfer(long from, long to, double amount) override
    {
        std::cout << std::format("[FILE] {} -> {}: {:.2f}\n", from, to, amount);
    }
};

// Bank class now depends on the ILogger interface, not a specific implementation
struct Bank
{
    // Constructor takes a Logger pointer, which allows dependency injection
    Bank(std::shared_ptr<ILogger> logger) : m_logger(std::move(logger)) {}

    // Setter to change the logger at runtime if needed
    void setLogger(std::shared_ptr<ILogger> new_logger)
    {
        m_logger = std::move(new_logger);
    }

    void makeTransfer(long from, long to, double amount)
    {
        if (!m_logger)
        {
            throw std::logic_error("Logger is not set!");
        }
        m_logger->logTransfer(from, to, amount);
    }

private:
    std::shared_ptr<ILogger> m_logger; // Pointer to a ILogger, allows polymorphic behavior
};

// Main function to demonstrate dependency injection
int main()
{
    // Inject a ConsoleLogger into Bank
    auto consoleLogger = std::make_shared<ConsoleLogger>();
    Bank bank(consoleLogger); // Dependency injection via constructor

    bank.makeTransfer(1000, 2000, 50.09);

    // Switch to a FileLogger at runtime
    auto fileLogger = std::make_shared<FileLogger>();
    bank.setLogger(fileLogger); // Dependency injection via setter

    bank.makeTransfer(2000, 4000, 20.00);

    return 0;
}

```

### Dependency  Injection

* **What is Dependency Injection?**\
  Dependency Injection is a design pattern that helps you pass dependencies (objects a class needs to function) to a class instead of creating them inside the class. This makes your code more flexible and testable.

The **`Bank`** class has a private **`logger`** attribute of type  **`std::shared_ptr<ILogger>`**, injected via:

* **Constructor Injection**:\
  In this example, we use **constructor injection** by passing the **`ILogger`** instance (dependency) into the **`Bank`** constructor:

```cpp
Bank bank(consoleLogger);
```

This means the **`Bank`** class can work with any Logger implementation, and we can switch between different Logger types without changing the Bank class itself.

* **Property Injection**:\
  We also use property injection with the **`setLogger`** method:

```cpp
bank.setLogger(fileLogger);
```

This lets us change the logger at runtime, adding more flexibility. Property injection is useful if you need to change dependencies after the object is created.

**Benefits of Dependency Injection**:

* **Flexibility**: The **`Bank`** class works with any **`Logger`** implementation. We can easily add a new **`DatabaseLogger`** in the future and use it with **`Bank`**.
* **Testability**: Dependency injection allows us to pass mock loggers to **`Bank`** during testing, making it easier to isolate and test its behavior.
* **Code Reuse and Decoupling**: **`Bank`** is not tightly coupled to specific logger implementations. It depends only on the **`ILogger`** interface, allowing us to reuse the **`Bank`** class in different contexts.

### Polymorphism in Action

By defining a **`ILogger`** interface with **`logTransfer`** as a virtual function, both **`ConsoleLogger`** and **`FileLogger`** provide their own implementations. The **`Bank`** class does not need to know which specific **`ILogger`** it’s using—it just calls **`logTransfer`**, and the correct function is called based on the actual **`ILogger`** type injected. This is polymorphism in action: different behaviors (**`ConsoleLogger`** and **`FileLogger`**) are accessed through a common interface (**`ILogger`**).

### Simple diagram for Dependency Injection in Bank class.

```
          +--------------------+
          |      Logger        |  <--- Interface
          |--------------------|
          | + logTransfer(...) |
          +--------------------+
                    ▲
                    |
         +----------+----------+
         |                     |
+-----------------+   +------------------+
|  ConsoleLogger  |   |   FileLogger     |  <--- Concrete Implementations
|-----------------|   |------------------|
| + logTransfer() |   | + logTransfer()  |
+-----------------+   +------------------+

                    ▲
                    |
               +------------+
               |    Bank    |
               |------------|
               | - logger   |   <--- Logger dependency (Injected)
               | + setLogger()     (Setter Injection)
               | + makeTransfer()  (uses logger)
               +------------+
```

This diagram highlights where dependency injection occurs, showing Bank relying on the **`ILogger`** interface rather than specific implementations, thus enabling runtime polymorphism.

## Reversing C++ Virtual Functions

Before delving into reversing C++ virtual functions, it is essential to outline some fundamental C++ concepts relevant to the task. While the benefits of Object-Oriented Programming (OOP) for programmers are undeniable, it’s worth considering whether these advantages extend to reverse engineers analyzing applications.&#x20;

Advanced OOP features like polymorphism and dynamic binding are commonly used, and it's important to note that resources can be initialized on the stack or allocated on the heap.&#x20;

At the assembly level, structs and classes are equivalent in terms of memory footprint, and fundamentally, both are collections of memory addresses corresponding to various types.

### Class Constructor

The ***constructor*** is a crucial concept for understanding the OOP structure in C++. Let’s examine how constructors are declared and defined.

* In the example below, we utilize ***dynamic initialization***, contrasting with the previous straightforward class example.
* Before invoking the constructor, the heap-allocated memory space designated to hold our class pointer is assigned to the **`rcx`** register, followed by a call to the parameterized constructor of the Material class, which acts as a subroutine.

<figure><img src="https://615064086-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MXlxki-LGPmhYCBAzg5%2Fuploads%2FlMbsoYtjQjsSN3LEoy4w%2Farg_this.png?alt=media&#x26;token=8afcf68a-382e-4749-9c66-e04b7d38c572" alt=""><figcaption></figcaption></figure>

The **`this`**&#x70;ointer, which is the hidden pointer of our Material object, is used in the constructor through the **`rcx`** register. The parameters we pass in the initialize phase are moved to the memory space allocated for our object according to their size.

### Copy Constructor

Both the Constructor and the Copy constructor allocates an object.

Copy Constructor receives an object pointer as parameter:

* The Copy Constructor **`Person::Person(Person &)`** is called with two key arguments:
* **rdx** holds the address of the old object (t), which is being copied.
* **rcx** holds the address of the location (this) where the new object will be created.

<figure><img src="https://615064086-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MXlxki-LGPmhYCBAzg5%2Fuploads%2Fhfrj1bXXAbgdGEHd6uHc%2FC.png?alt=media&#x26;token=79a1d430-9843-44c1-93cc-fad4500d05b8" alt=""><figcaption></figcaption></figure>

Stepping into the Copy Constructor :

<figure><img src="https://615064086-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MXlxki-LGPmhYCBAzg5%2Fuploads%2FBDW10BTbkWhSw1lwLIGp%2F2024-09-04_003518.png?alt=media&#x26;token=f642298a-6a5e-45f5-913f-3f16cffcf805" alt=""><figcaption></figcaption></figure>

### Object's methods

Those getters method under the hood they're passing the hidden **this pointer** via **`rcx`** register, which indicate the use of OOP primitive.

<figure><img src="https://615064086-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MXlxki-LGPmhYCBAzg5%2Fuploads%2FP5W9ZmaW8j5RE9bozfVy%2Fthis_method.png?alt=media&#x26;token=72102f10-c0fb-4d27-86fb-66295c3b6a8a" alt=""><figcaption></figcaption></figure>

### Inheritance

Inheritance is a fundamental concept that defines the inter-class relationships and extension structure in Object-Oriented Programming (OOP). It is essential to understand the distinction between the **base** class and the **derived** class, as these two concepts play a critical role in how classes interact and extend functionality.

#### Single inheritance

In this situation derived class has only one base class.

```cpp
#include <iostream>

class Plant
{
public:
	Plant() : m_age{ 0 } 
	{
		std::cout << "Call Plant() Base Constructor\n";
	};
	
private:
	int m_age{};
};

class Tree : public Plant
{
public:
	Tree() :m_leafcount{ 0 } {
	
		std::cout << "Call Tree() Derived Constructor\n";
	};
private:
	int m_leafcount{};
};

class Fruit : public Plant
{
public:
	Fruit() :m_waterpercent{ 0 } {
		std::cout << "Call Fruit() Derived Constructor\n";
	};
private:
	int m_waterpercent{};
};

int main()
{
	Tree oak;
	Fruit apple;
	return 0;
}
```

let's disassemble the code to understand to flow of constructor calls for each base class and the derived class, here's a detailed breakdown of what's happening in the given assembly instructions :&#x20;

<figure><img src="https://615064086-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MXlxki-LGPmhYCBAzg5%2Fuploads%2Fx1RvkIa1OdiS1p3wmqCQ%2Fderivedclass.png?alt=media&#x26;token=af0e04ee-619f-4b3b-8fee-702ba62e14c3" alt=""><figcaption></figcaption></figure>

* **`lea rcx, [rbp+110h_var_108]`**
  * The **`lea`** (Load Affective Address) instruction loads the address of the local variable into **`rcx`** register. In this case, it's loading the address of the object that is being constructed.
  * **`rbp+110h_var_108`** indicates the location of the **`this`** pointer, which refers to the current object on the **stack**.
  * We're aware that both derived objects are created on the stack, then we're calling their constructors respectively **`Tree()`** and **`Fruit()`**.
* **`call Tree:Tree(void)`**
  * This line calls the constructor of the **`Tree`** class (**`Tree::Tree()`**).
  * The **`rcx`** register holds the **`this`** pointer, which is passed to the constructor to initialize the **`Tree`** object

Stepping into one of these constructors, we're noticing the **`this`** pointer is used to call the **Base** constructor:

<figure><img src="https://615064086-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MXlxki-LGPmhYCBAzg5%2Fuploads%2F82DEvOv1EY4kgI1LHZiO%2Fderivedclass_1.png?alt=media&#x26;token=b467b070-c9eb-4668-b169-9e0c2d3a74b8" alt=""><figcaption></figcaption></figure>

{% hint style="info" %}
***Always the constructor of the base class is called first then after followed by the constructor of the derived class***.
{% endhint %}

<figure><img src="https://615064086-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MXlxki-LGPmhYCBAzg5%2Fuploads%2FooGae9lr3yJoRI2Cxld4%2Fbasefirst.png?alt=media&#x26;token=9192b669-a4e7-4614-9182-e7f5cdad9753" alt=""><figcaption></figcaption></figure>

#### Multiple inheritance

Derived class has more that one base class.

```cpp
#include <iostream>

class Plant
{
public:
	Plant() : m_age{ 0 } { std::cout << "Plant::Plant()\n";}
private:
	int m_age{};
};

class Forest
{
public:
	Forest() :numof_trees{} { std::cout << "Forest::Forest()\n"; }
private:
	int numof_trees{};
};


class Tree : public Plant, Forest
{
public:
	Tree() :leaf_count{ 0 } { std::cout << "Tree::Tree()\n"; }
private:
	int leaf_count{};
};



int main()
{
	Tree tr;
	return 0;
}
```

Again the Tree object is created on the stack, and we're using **`rcx`** to save the address of the object into the **`this`** pointer :

<figure><img src="https://615064086-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MXlxki-LGPmhYCBAzg5%2Fuploads%2FhM3RSHVC0h9nRKAvYae8%2Fmultiple%20inheri.png?alt=media&#x26;token=cd1e1d13-92b8-479d-9639-264f10080b71" alt=""><figcaption></figcaption></figure>

Stepping into the **`Tree()`** constructor :

{% hint style="info" %}
***The constructor calls always start with the base constructors from left to right, followed by the derived class constructor.***
{% endhint %}

<figure><img src="https://615064086-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MXlxki-LGPmhYCBAzg5%2Fuploads%2Fnje1eX1exWsnqzGg26HZ%2Fmultiple%20inheri2.png?alt=media&#x26;token=aa1d545f-45a8-43ec-b320-5fcdf01d2ec5" alt=""><figcaption></figcaption></figure>

In multiple inheritance, we can see the value of the private members correspondent respectively :

1. Plant Base class attributes.
2. Forrest Base class attributes.
3. Tree Derived class attributes.

<figure><img src="https://615064086-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MXlxki-LGPmhYCBAzg5%2Fuploads%2FCf0Di7nLcXeT6BxXpUbK%2Forderinmemmultiple.png?alt=media&#x26;token=7f3c877d-7db2-4af3-8e68-7f1c2837c5ec" alt=""><figcaption></figcaption></figure>

### Polymorphism

#### Code example&#x20;

```cpp
#include <iostream>

class Plant
{
public:
	Plant() : m_age{ 0 } { std::cout << "Plant::Plant()\n";}
	virtual void Create() { std::cout << "New Plant type created\n"; }
	void del() { std::cout << "Plant type deleted!\n"; }
private:
	int m_age{};
};


class Tree final : public Plant
{
public:
	Tree() :leaf_count{ 0 } { std::cout << "Tree::Tree()\n"; }
	void Create() override { std::cout << "New Tree type created\n"; }
	void del() { std::cout << "Tree type deleted!\n"; }
private:
	int leaf_count{};
};

int main() {
	Tree* oak = new Tree;
	Plant* plt{ oak };

	plt->Create();
	plt->del();

	return 0;
}
```

After disassembling the above code, especially in a first step we're targeting the pseudocode of the derived class constructor :

* The **`this`** pointer is obtained from the memory allocated by the C++ **`new`** operator. The instruction **`` lea rcx, const Tree::`vftable` ``** highlights the setup for virtual functions, supporting the runtime polymorphism. Unlike standard inheritance, the object's memory layout includes a **`vftable`** pointer as the first **8 bytes** , directing to a table that holds pointers to overridden virtual functions.

<figure><img src="https://615064086-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MXlxki-LGPmhYCBAzg5%2Fuploads%2Fzab8iByuehx02FJFi7EB%2Fvtable.png?alt=media&#x26;token=aff9e542-465d-4fea-9e71-52ce6cc6adf8" alt=""><figcaption></figcaption></figure>

Also the base class has its own **`vftable`**&#x20;

<figure><img src="https://615064086-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MXlxki-LGPmhYCBAzg5%2Fuploads%2FlPQ4SyVRK72GACh6krtM%2Fvftablebase.png?alt=media&#x26;token=0738fe57-034e-4e74-bcde-f68ff01d945a" alt=""><figcaption></figcaption></figure>

During debugging, I noticed that **the initial 8 bytes** of the object held the address of the base class's **`vftable`** during the execution of the base class constructor. When control returned to the derived class constructor, these 8 bytes were updated to point to the **`vftable`** of the derived class, reflecting the correct virtual function overrides and **`Tree::Create()`** will be called

The screenshot below demonstrates the concrete usage of the **`vftable`**. Initially, the pointer to the **`vftbale`** is loaded into **`rax`** register from memory ( **`mov rax, qword ptr[plt]`**). The next instruction dereferences this pointer to fetch the **`vftable`** itself. The address of the function **`Tree::Create()`** is then accessed via the **`vftable`** with an offset of **`0x0`**, which points to the **`create()`** method. Finally the **`call`** function is used to invoke **`Tree:Create(`**`)` method indirectly through the **`vftable`**, illustrating the **dynamic dispatch** mechanism at runtime.

<figure><img src="https://615064086-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MXlxki-LGPmhYCBAzg5%2Fuploads%2Ft4VNwdEOM25jSLashTNz%2FcallflowCreate.png?alt=media&#x26;token=3c14ed0c-8356-499c-8045-654ffc85e7fd" alt=""><figcaption></figcaption></figure>

### Devirtualize a virtual function call

#### The basics

Let's start by looking how the compiler implements virtual functions. Suppose we have the following polymorphism implementation:

```cpp
#pragma once
#include <iostream>

struct Mammal
{
	Mammal() { std::cout << "Mammal::Mammal()\n"; }
	virtual ~Mammal() { std::cout << "Mammal::~Mammal()\n"; }

	virtual void run() = 0;
	virtual void walk() = 0;
	virtual void move() { walk(); }
};

struct Cat : Mammal
{
	Cat() { std::cout << "Cat::Cat()\n"; }
	virtual ~Cat() { std::cout << "Cat::~Cat()\n"; }

	void run() override { std::cout << "Cat::run()\n"; }
	void walk() override { std::cout << "Cat::walk()\n"; }
};

struct Dog : Mammal
{
	Dog() { std::cout << "Dog::Dog()\n"; }
	virtual ~Dog() { std::cout << "Dog::~Dog()\n"; }

	void run() override { std::cout << "Dog::run()\n"; }
	void walk() override { std::cout << "Dog::walk()\n"; }
};
```

With the following main function:

```cpp
#include <iostream>
#include <cstdlib>
#include "reversing_1.h"

int main()
{
	Mammal* m;
	if (rand() % 2)
	{
		m = new Cat();
	}
	else
	{
		m = new Dog();
	}
	m->walk();

	delete m;
	return 0;
}
```

We're aware that **`m`** value depends on the **`rand()`** which determined until runtime phase. The compiler cannot know know this ahead of time, so how does it call the right function ?

The answer is that for each type having a virtual function, the compiler inserts a table of functions pointers called **`vftable`** into the resulting binary.

Each instance of such a type is given an additional member called **`vptr`** that points to the correct **`vftable`** for that object. Code to initialize this pointer with the right value will be added to the constructor. When the program want to call a virtual function, it can just access the correct entry in the **`vftable`** for the object and call it.

The entries in the table ***must be in the same order for each related type***.

We would expect to find three tables in the binary for **Mammal**, **Cat**, and **Dog**. We can locate them quickly by looking through **`.rdata`** section:

<figure><img src="https://615064086-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MXlxki-LGPmhYCBAzg5%2Fuploads%2Fvx0tTYgj64GCZ2PEpQke%2Fvftablemmal.png?alt=media&#x26;token=f25bfffe-913e-4a76-a161-460fbbaf8ae0" alt=""><figcaption></figcaption></figure>

#### **Decompiling main()**

The main function it decompiles to:

<figure><img src="https://615064086-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MXlxki-LGPmhYCBAzg5%2Fuploads%2FyTIqBUoZpYzQbdkjCsNz%2Fdecompile.png?alt=media&#x26;token=7c68c884-5a53-432a-9d9e-b7bb3c23cba4" alt=""><figcaption></figcaption></figure>

To make reversing more realistic I disassemble it without symbols, so renamed variable based on the current case we're studding:

* 🔧**Memory Allocation**:

In both branches of the **`if/else`** statement, 8 bytes of memory are being allocated using new operator.

This allocation size of 8 bytes is consistent with the size of a single pointer on a 64-bit system, which matches the size of a virtual pointer **`vptr`**

* 🔧**Virtual Pointer**

When an object of a class with virtual functions is created, the compiler inserts a hidden member in the object, the virtual pointer (**`vptr`**).

This **`vptr`** points to the virtual function table **`vftable`** of the object, which holds the addresses of the virtual functions of that class.

* 🔧**Object construction**

After allocating memory, the constructors **`Cat::Cat(ptr_this)`** and **`Dog::Dog(ptr_this)`** are called, initializing the object.

These constructors set up the **`vptr`** to point to the appropriate **`vftable`** for **`Car`** or **`Dog`**.

We can see the virtual function calls on lines **24**&**26**. In the first, the compiler is dereferencing (to get the **`pvft`**) and adding **16 bytes** to access the **3rd** entry in the **`vftable`**. Line **26** get the **1st** entry in the table which is most of the cases the **Destructor**.

Looking at the tables, the 3rd entries for each class are

* **j\_\_purecall** (Mammal the abstract class)
* **sub\_140011005** (Cat derived class)

<figure><img src="https://615064086-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MXlxki-LGPmhYCBAzg5%2Fuploads%2FG4168l85ZfKP2rE6mj5H%2FCatwalk.png?alt=media&#x26;token=0ce9a8e6-1eb0-4019-80d5-ee91eb0c11c3" alt=""><figcaption></figcaption></figure>

* **sub\_14001112C** (Dog derived class)

<figure><img src="https://615064086-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MXlxki-LGPmhYCBAzg5%2Fuploads%2F3SaMWcBEsMrRGCOEU0s6%2Fdogwalk.png?alt=media&#x26;token=998ca364-a7ad-4897-81a6-6514cb4e80f5" alt=""><figcaption></figcaption></figure>

There are 4 entries in each `vtable` :

1. Destructor
2. run
3. walk
4. move

Notice that because neither Car nor Dog implemented **`move()`**, the both inherited the definition from **`Mammal`** and so the **`move`** entries in their **`vftable`** are the same.

**Create Structures**

To declare the functions inside a structure X as function pointers in IDA, you should understand ***the signature*** :

* Calling convention
* Return type
* parameters types

Once you have this information, you can correctly define the function pointers in your struct.

At this point is useful to start defining some structures. We've already seen that the only member of the Mammal, Cat, Dog structures will be their **`vptrs`**.

Also we should create structure for each **`vftable`**, the objective here is to get the decompile output to show us what function would actually be called if **`m`** had a particular type. We can then cycle through these possibilities and examine all of the options:

<figure><img src="https://615064086-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MXlxki-LGPmhYCBAzg5%2Fuploads%2F2GoMAYTWFpAPPSTe6VI2%2Fcratestruct.png?alt=media&#x26;token=8bf4b7de-3bc7-43f1-aa10-29ab3409c0aa" alt=""><figcaption></figcaption></figure>

As I mentioned previously we should set a the right signature for each virtual function declared within its structure:

```cpp
struct Catvftable

{

void (__thiscall *Cat_desctructor)(void *);

void (__thiscall *Cat_run)(void *);

void (__thiscall *Cat_walk)(void *);

void (__thiscall *Mammal_move)(void *);

};
```

If we go back to the decompiled code for main, we can now rename the local variable to **`m`**, and set its type to be **`Cat*`** or **`Dog*`**

<figure><img src="https://615064086-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MXlxki-LGPmhYCBAzg5%2Fuploads%2FQ7anBaKawFzwSz7dgdxe%2Fdecompilecart.png?alt=media&#x26;token=9ceb26be-0fb4-4bb7-9a91-170360ddf76f" alt=""><figcaption></figcaption></figure>

We could set **`m`** to b&#x65;**`Mammal*`**, but we will see some problems if we do that :

<figure><img src="https://615064086-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2F-MXlxki-LGPmhYCBAzg5%2Fuploads%2FdvXat9l1HiODSXXb6YZn%2Fdecompilermammal.png?alt=media&#x26;token=4fd0eb31-67ef-42cf-8916-1c391733fc49" alt=""><figcaption></figcaption></figure>

Notice if the type of **`m`** was **`*Mammal`** then the call at line **24** would be to a pure virtual function. This should never happen !!

The dynamic type will be **`Cat`** or **`Dog`**, and we know which functions will be called in either case by looking at their **`vftable`** entries.

## Conclusion

Polymorphism is a cornerstone of C++ that significantly contributes to the implementation of COM in Windows. Therefore, understanding this feature from both a programming and a reverse engineering perspective is crucial for comprehending the underlying mechanics. I believe that before diving into security aspects, it is essential to acquire fundamental knowledge, as these foundational concepts will guide you in the process of identifying bugs.

**Final Note:** I am not an expert in C++ reverse engineering or programming; I am merely a learner. If you notice any inaccuracies in my statements, please feel free to reach out and correct me—I would greatly appreciate it. Thank you very much for taking the time to read this post!

## References

{% embed url="<https://www.amazon.co.uk/Crash-Course-Joshua-Alfred-Lospinoso/dp/1593278888>" %}

{% embed url="<https://www.amazon.co.uk/C-Programming-Language-Bjarne-Stroustrup/dp/0321958322>" %}

{% embed url="<https://fatihsensoy.com/posts/reversing-cpp-0x00/>" %}

{% embed url="<https://alschwalm.com/blog/static/2016/12/17/reversing-c-virtual-functions/>" %}

{% embed url="<https://alschwalm.com/blog/static/2017/01/24/reversing-c-virtual-functions-part-2-2/>" %}
