In my introductory post I had been talking about dynamic memory allocation and I referenced various solutions that are used to tackle this problem. In this post I am going to focus on the GNU C library’s memory allocation which is based on ptmalloc2 (pthreads malloc) and it is derived from dlmalloc (Doug Lea’s malloc).
The heap, assigned to each program, is organised according to three major components:
Recall the parking lot from my previous post ? if we liken the Arena as a parking building with multiple levels and slots, then we would have a structure similar to the one depicted below:
As P-Acme’s (our parking company) demands keep growing, new buildings are built and new employees are hired to manage them. Similarly when a new thread is created the system assigns a new arena to it. There is a limit to the number of arenas, so when this limit is reached the program has to reuse the existing ones. The program below creates two threads, while each one of them will get a separate arena:
Now that we have an idea about the heap structure lets get a full overview of the ptmalloc allocator.
To allocate memory for the “main arena”, malloc invokes the sbrk function and despite the requested size, the system will assign 132 KB of memory. Further malloc invocations within the main thread will keep using the same arena while in case the memory is exhausted, the heap size will be increased. For every new thread a different arena (thread arena) is created by calling the mmap function and once again 132 KB are allocated. The maximum number of arenas is eight times the number of CPUs in the system (unless the user specifies otherwise).
Glibc uses a C struct called malloc_state to maintain information about each arena. For the main arena this information is stored in a global variable while for the rest it is stored in the thread arenas themselves:
Next comes the heap_info C struct, with heap metadata information:
Note that, every heap has its own header except the one that belongs to the main arena. Finally, we have the malloc_chunk C struct which, as you probably guessed, contains information about a chunk. A high level overview about the concepts that we discussed so far can be found below:
As Glibc’s malloc is chunk-oriented, we will focus on the malloc_chunk (the green block above) and get into more details about how it is structured.
If you examine the malloc_chunk C struct, you will notice that the fd,bk, fb_nextsize and bk_nextsize properties are used only when the chunk is free (see code snippet bellow) while the mchunk_prev_size is used only in case the previous chunk is free:
The mchunk_size indicates the size of the current chunk and since it is a multiple of 8 bytes, the last 3 bits (lets call them |A|M|P|) are used as flags to indicate one of the following states:
Note that, since chunks are adjacent to each other in memory, if you know the address of the first chunk (lowest address) in a heap, you can iterate through all the chunks in the heap by using the size information.
Let us for now forget about multiple arenas and focus on the main one. In order to understand how the chunk assignment works we are going to use the parking slots concept and see how Alice (a parking employee) handles a request:
And that’s all for now :), stay in touch for the next parts !
[1] https://virtual-index.com/programming/pointers
[2] http://blog.k3170makan.com/2018/11/glibc-heap-exploitation-basics.html
[3] https://sensepost.com/blog/2017/painless-intro-to-the-linux-userland-heap/
[4] http://core-analyzer.sourceforge.net/index_files/Page335.html
[5] http://blog.k3170makan.com/2018/12/glibc-heap-exploitation-basics.html
[6] https://blog.k3170makan.com/2019/03/glibc-heap-exploitation-basics.html
[7] https://azeria-labs.com/heap-exploitation-part-1-understanding-the-glibc-heap-implementation/