Master Thesis Proposals

Java is one of the most popular programming languages in the world and it runs on billions of devices scaling from credit cards to multi-machine servers. Oracle is the main contributor to the Java programming language, developed through the OpenJDK project. The Java Virtual Machine (JVM) is the core piece of technology that enables Java’s “write once, run anywhere” - the ability to run the same Java program on multiple hardware architectures and operating systems without having to recompile the code. The JVM also implements the Java memory management with a garbage collector that handles all the details for you, and just-in-time (JIT) compilers that enable Java performance to be better than what is possible with any statically compiled language.

The Oracle development office in Stockholm hosts a large part of the JVM development team. We have world leading expertise in areas such as garbage collection, managed runtimes, and compilers. The thesis work will be performed on-site in the Stockholm office in collaboration with Oracle experts and academic researchers who are doing scholarly work on the JVM. For the prospective thesis student, this means the availability of both expert advice on the JVM domain as well as expert advice on writing, and the academic discipline.

To apply, or for more information, contact jesper.wilhelmsson@oracle.com.

Current Project Proposals


Automatic Dynamic Optimization of Remembered Set Containers in G1

Summary: Let the G1 collector automatically determine remembered set container options for reduced memory usage.

Description
In the JVM's default garbage collector, G1, remembered sets store locations of incoming references to a particular region of the heap. This data structure is basically an implementation of a sparse set of integers: the entire range of possible values is split into evenly sized areas. A top level concurrent hash table stores values in areas that are "in" the set in a so-called remembered set container. Such a container is represented, depending on the number of values to be stored in that area it covers, by different kinds of data structures, e.g. arrays, bitmaps, or even single special integers.

The remembered set implementation switches between containers on the fly depending on current occupancy of an area.

G1 currently sizes these containers statically, i.e. independent of actual distribution of values in a given remembered set. So a particular container has a fixed size being able to hold a fixed amount of values, eg. an "array" remembered set container always has 128 entries, regardless of what the typical occupancy of such an array container is. This wastes memory, because different types of applications (and remembered sets for different areas of the heap) exhibit different occupancy characteristics.

The task is to change G1 to let it reconfigure the remembered set containers based on statistics that need to be gathered while an application is running to optimize its memory footprint, and evaluate the effectiveness of these optimizations on several benchmarks.

Requirements

This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.

G1 garbage collector Full GC improvements

Summary: Make G1 Full GC Faster and More Versatile.

Description
In JDK 10 the Hotspot G1 garbage collector received a parallel full-heap collector. It uses a parallelized mark-sweep-compact algorithm. While its performance is on par with the Parallel GC Full GC algorithm, there are opportunities to improve the algorithm related to work distribution, exploiting pre-existing work and handling various edge cases better.

Some of these improvements could include

Requirements

This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.

Handling Irreducibility in a Just-In-Time Compiler

Summary: This thesis studies approaches to make the graph-based internal representation of C2 (JVM's highly optimizing just-in-time compiler) more maintainable, accessible, and amenable to optimizations.

Description
Most compilers represent the code under compilation as a control-flow graph (CFG). A key property of a CFG is whether it is reducible, i.e. it does not contain any jump from outside a loop into the middle of it. Irreducible CFGs arise from the use of unstructured language constructs such as C's 'goto' or Kotlin coroutines, from the application of on-stack replacement techniques in virtual machines, and from compiler transformations such as tail call optimization.

Irreducible CFGs make code analysis and optimization more complex, computationally expensive, and error-prone. For this reason, many compilers transform irreducible CFGs into a reducible form, applying techniques such as node splitting or dispatching, before further analyzing and optimizing the code under compilation.

C2 aims at handling all kinds of CFGs, which has resulted in tens of complex and expensive bugs over its lifetime. This thesis studies different approaches to restrict the structure of CFGs in C2, with the goal of improving its maintainability. Potential approaches include reducibility transformations at different compilation stages, falling back to other JVM execution modes in certain cases, or combinations of these. A successful study has the potential of making Java's main virtual machine more accessible and amenable to further optimizations.

Requirements

This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.

Humongous Object Aware Region Allocation in G1

Summary: Explore options to improve G1 humongous object inner and outer fragmentation.

Description
The Hotspot G1 garbage collector is a regional collector: the Java heap is strictly split into same-sized regions. Objects larger than a single region ("humongous objects") are allocated using separate contiguous sets of regions, and are unmovable for performance reasons. This poses a few problems, for example:

This project should explore options to mitigate one or both of these fragmentation problems. There are several ideas to change existing heap management strategies to improve either. Examples could include better region selection for evacuation and placement of humongous objects, automatic region level defragmentation efforts during garbage collection, over-provisioning of the reserved heap area, more aggressive attempts to reclaim humongous objects, and regular object allocation at the end of a humongous object, or a special humongous object allocation area like ZGC.

Requirements

This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.

Stable Visualization of Intermediate Program Transformations

Summary: This thesis studies the feasibility, efficiency, and effectiveness of using stable graph layout algorithms to visualize the transformations performed by a JIT compiler.

Description
Modern optimizing compilers are designed around a well-specified representation of the code under compilation: an Intermediate Representation (IR).

To understand, improve, and debug optimizing compilers, it is useful to represent the IR as a graph and the compilation process as a sequence of graphs. Different tools exist to support visualizing and stepping through IR graph sequences. Common to these tools is the use of unstable graph layout algorithms that depict a single graph without taking into account its context. While ignoring context improves the quality of individual graphs, it makes it difficult to answer a basic question when stepping through a graph sequence: what changes from one step to the next one?

This thesis proposes investigating the feasibility, efficiency, and effectiveness of applying contextual, stable graph layout algorithms that balance the quality of individual graphs with the ease of identifying changes visually when stepping through the graph sequence.

The study will be conducted on the Ideal Graph Visualizer, an open-source tool that is part of OpenJDK. A successful conduction of this study has the potential of improving the ability of compiler engineers to understand, improve, and debug the main JIT compiler for a programming language that is used by millions of developers.

Requirements

This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.