A memory allocator built from scratch in C++ using mmap for pool-based allocation. Implements mymalloc, myfree, block coalescing, and block splitting without any standard library allocation functions.
A 1MB memory pool is created upfront via a single mmap call. All allocations carve blocks out of this contiguous pool. Each block stores metadata (size, free status, next pointer) immediately before the user data. The free list is kept sorted by memory address so adjacent free blocks can be merged correctly. When a reused block is larger than needed, block splitting carves it into an allocated chunk and a smaller free leftover, reducing wasted space.
Before testing the custom allocator, standard malloc and free behavior is verified. After freeing a pointer, reading from it returns garbage since the memory is no longer owned.

mymalloc carves a block from the pool and returns a pointer past the metadata. myfree marks the block as free without releasing it back to the OS, keeping it available for reuse.

After freeing a block, the next allocation of the same size reuses it instead of consuming new pool memory. Verified by comparing pointer addresses before and after.

Two adjacent blocks are allocated and freed. The allocator merges them into a single larger block via coalescing. A subsequent allocation larger than either individual block reuses the merged space.

Benchmarked over 1,000,000 iterations on Linux. Lower is better.
| Test | mymalloc | malloc |
|---|---|---|
| Single size (4 bytes) | 7.82 ms | 16.61 ms |
| Mixed sizes (8-512 bytes) | 16.49 ms | 7.89 ms |
| Bulk allocate 1000 then free all | 1.55 ms | 0.026 ms |
| Reverse order free | 1.88 ms | 0.011 ms |
Single size allocation is competitive with and occasionally beats standard malloc due to cache-friendly free list reuse. Mixed sizes and bulk free patterns are slower because sorted insertion into the free list, required for coalescing correctness, makes each free an O(n) operation.
| Metric | Value |
|---|---|
| Total time | 47.27 ms |
| Instructions | 542,714,619 |
| CPU cycles | 192,681,943 |
| Instructions per cycle | 2.8 |
| Branch misses | 19,919 / 107,406,979 (0.018%) |
| Backend bound | 27.6% |
| Retiring | 54.4% |
2.8 instructions per cycle indicates the CPU is kept busy. 54.4% retiring means the majority of cycles are doing real useful work. The 27.6% backend bound points to memory latency stalls from walking the linked list during sorted insertion, which confirms why bulk free is slow. Branch prediction is nearly perfect at 0.018% miss rate, meaning the list traversal logic is highly predictable. The bottleneck is memory access patterns, not computation.
g++ main.cpp -o mall
g++ -O2 -o benchmark benchmark.cpp -std=c++17