Using Buddy Allocators to Reduce Inter-Page Fragmentation in ZGC

Casper Norrbin on January 8, 2025

The work presented here is performed as part of the joint research project between Oracle, Uppsala University and KTH.

My name is Casper, and I recently finished my 5-year degree in Computer and Information Engineering at Uppsala University. For my masters’ thesis, I had the privilege of collaborating with the GC team at Oracle’s Stockholm office, alongside Joel Sikström and Niclas Gärds, who pursued related projects concurrently. In this post, I’ll share an overview of the work I carried out this past spring.

Problem Background

ZGC uses bump-pointer allocation in its pages, which allows for fast sequential allocations, but leads to fragmentation over time. As objects die, gaps of unused memory are created behind the bump-pointer. To reuse this memory, ZGC needs to relocate live objects to another page before resetting the bump-pointer on the current page. My research explored how combining ZGC with a secondary free-list-based allocator could work, specifically a buddy allocator, which would allow for the gaps of memory to be reused without first needing to reset the page.

Exploring Buddy Allocators

Buddy allocators are an alternative allocation strategy that involves splitting memory into blocks of sizes that are powers of two, which can then be dynamically merged or split as needed. I specifically evaluated three variations:

Binary Buddy Allocator is the original design, which splits blocks recursively until a suitable size is found.
Binary Tree Buddy Allocator uses a binary tree structure instead of a free-list to track and store unused blocks.
Inverse Buddy Allocator pre-splits blocks, and instead merges blocks to satisfy larger allocation requests.

Each of the buddy allocators has unique strengths and weaknesses. The Binary Buddy Allocator is versatile and simple to understand, the Binary Tree Buddy Allocator is faster, at the cost of extra metadata overhead, and the Inverse Buddy Allocator is fast at allocating small blocks but slower for larger blocks.

Adaptation for ZGC

Integrating an allocator with ZGC presents a unique challenge. The allocator does not have complete control over the memory it is allocating in and instead has to work with its given memory. Each allocator is tied to a single ZGC page and is initially marked as full, with no space for new allocations. Then, each free memory range is marked as available, essentially “carving” out areas in the page for the allocator to work with.

Buddy Allocator Adaptations

In addition to adapting the allocators to ZGC, I also explored some other optimizations to improve the general performance of the allocators:

Lazy Splitting and Merging: We can reduce splitting and merging operations by delaying the merging of free memory blocks. A “lazy layer” keeps track of deallocated blocks, merging only when necessary.
Allocator Regions: The allocator memory is divided into regions to improve concurrency, allowing for simultaneous allocations across different regions. By prioritizing some regions, fragmentation is reduced and space is kept for larger allocations.

Results

The allocators were evaluated based on allocation speed, fragmentation, and memory overhead. While they performed well in allocation speed and memory overhead, fragmentation is the most important metric for this use case. Overall, they performed solidly, utilizing memory well, but the inherent limitation of buddy-style allocators becomes apparent. Specifically, allocation sizes must be rounded to the nearest power of two and aligned accordingly, leading to unavoidable memory waste that is challenging to minimize. This structural inefficiency results in underutilized space and increases fragmentation, presenting a significant trade-off in their use.

Conclusions and Future Work

My research demonstrates that integrating an allocator with ZGC is feasible and that an existing allocator can be adapted to suit this purpose. This would help reduce fragmentation, enable memory reuse, and decrease the number of needed relocations. If using a buddy-style allocator, specific considerations must be assessed due to its previously mentioned limitations.

The logical next step would be integrating the allocator with ZGC, a topic Niclas Gärds explored concurrently in his thesis. Future work could focus on further refining the allocator and developing hybrid approaches combining different buddy-style allocators to offset their weaknesses. Additionally, preparation for the new minimum allocation size introduced by project Lilliput is essential if the allocator is to be used.

If this post piques your interest, I encourage you to explore my full published report on DiVA, where I delve deeper into the topics covered here and discuss additional areas beyond this post.

In closing, I want to extend my heartfelt gratitude to everyone at the Stockholm office for their warm welcome. A special thank you to Erik Österlund and Tobias Wrigstad for their invaluable guidance and unwavering support throughout the project. Finally, thank you to Joel and Niclas for making this project and spring exciting.