Addressing Fragmentation in ZGC through Custom Allocators


The work presented here is performed as part of the joint research project between Oracle, Uppsala University and KTH. Follow the blog series here at inside.java to read more about the JVM research performed at the Oracle Development office in Stockholm.


Addressing Fragmentation in ZGC through Custom Allocators: A Summary

My name is Joel and I am currently finishing my 5-year degree in Computer and Information Engineering at Uppsala University, with a focus on Software Engineering. This post details my master thesis work as part of the GC team at Oracle’s Stockholm office in the spring of 2024. I’ve worked closely with Casper Norrbin and Niclas Gärds, who have also conducted their masters theses at Oracle this spring in the same area as I have.

Problem Statement

ZGC and other garbage collectors typically use bump-pointer allocation, which is efficient for sequential allocations but can lead to fragmentation over time. Fragmentation occurs when memory gaps are created that cannot be easily reused, necessitating costly relocation of live objects. The goal of this research is to reduce the need for relocation in ZGC by using a free-list-based allocator alongside a bump-pointer allocator, which can track and utilize fragmented memory more effectively in certain scenarios.

Methodology

My research focuses on adapting an allocator to make it better suited for use in ZGC, based on the Two-Level Segregated Fit (TLSF) allocator by Masmano et al. The main adaptations I contribute with are:

  • 0-byte Header: By utilizing information within ZGC, the allocator introduces a 0-byte header, which significantly reduces internal fragmentation. The image below shows 1) the reference design, 2) the general design, and 3) the optimized 0-byte header.

Block Header Comparison

  • ZGC Small Pages: Limiting the allocator to be used inside the limited size (2MB) and allocation size range ([16 B, 256 KB]) of ZGC, internal representations can be stored and used more efficiently. The image below shows how the large number of first- and second-levels are flattened into a 64-bit word. Bitmap Flattening

  • Concurrency: Concurrent operations on the allocators are supported using a lock-free mechanism, which considers many different problems and use-cases.

The 0-byte header is especially noteworthy as it is made possible by a series of smaller adaptations to the allocator. Adaptations such as deferring coalescing, reducing the supported heap size to that of ZGC’s small pages, and leveraging information that is already part of the Java Object header, make the 0-byte header possible. Additionally, concurrency can be solved in many ways, but the lock-free solution that is part of my research is made significantly easier to implement by the already mentioned adaptations. Without these adaptations, implementing a lock-free solution would be much more complex.

Results

The adapted allocator shows promising potential to be used in ZGC, with an emphasis on allocating memory.

  • Performance: For single allocations, the new allocator performed on par with the reference implementation. However, it was slightly slower for single deallocations and real-world allocation patterns. This trade-off is considered acceptable given the significant reduction in fragmentation.

  • Memory Efficiency: The introduction of the 0-byte header and other optimizations led to a notable decrease in internal fragmentation. This improvement in memory efficiency suggests that the new allocator is effective in managing fragmented memory.

Conclusion

My work demonstrates that customizing allocators for use within garbage collectors like ZGC is a viable approach to addressing memory fragmentation. The adapted allocator not only reduces the need for costly relocations but also enhances overall memory efficiency. I’ve shown that there is significant potential for adaptations to be made to TLSF for use in ZGC, which might also apply to other allocators. The most apparent next step of my work is to integrate the allocator into ZGC (which Niclas Gärds, who conducted his thesis at Oracle in parallel to mine, has done). Other areas to consider include the new minimum allocation size in Java from the Lilliput project and addressing starvation in the adapted allocator’s concurrency implementation.

I would like to end by giving a huge thank you to everyone at the Oracle office in Stockholm for sharing their knowledge, making me feel part of the team, and fostering an environment that inspires learning. Thank you to Erik Österlund and Tobias Wrigstad for your steady support, knowledge, and guidance throughout the project. Finally, thank you to Casper and Niclas for making this spring special and exciting!

If you want to read more about my work, have a look at my published report at DiVA, which provides more detail and depth to the concepts explained in this post, as well as additional areas not covered here.