Master Thesis Proposals

Java is one of the most popular programming languages in the world and it runs on billions of devices scaling from credit cards to multi-machine servers. Oracle is the main contributor to the Java programming language, developed through the OpenJDK project. The Java Virtual Machine (JVM) is the core piece of technology that enables Java’s “write once, run anywhere” - the ability to run the same Java program on multiple hardware architectures and operating systems without having to recompile the code. The JVM also implements the Java memory management with a garbage collector that handles all the details for you, and just-in-time (JIT) compilers that enable Java performance to be better than what is possible with any statically compiled language.

The Oracle development office in Stockholm hosts a large part of the JVM development team. We have world leading expertise in areas such as garbage collection, managed runtimes, and compilers. The thesis work will be performed on-site in the Stockholm office in collaboration with Oracle experts and academic researchers who are doing scholarly work on the JVM. For the prospective thesis student, this means the availability of both expert advice on the JVM domain as well as expert advice on writing, and the academic discipline.

To apply, or for more information, contact jesper.wilhelmsson@oracle.com.

Current Project Proposals


JVM Tuning with Machine Learning on Garbage Collection Logs

Summary: Create machine learning models based on garbage collection logging data, use them to generate tuning recommendations for the Java Virtual Machine and optionally apply them during run-time.

Description

The performance of an application running on the JVM depends not only on the way the application is coded, but also on various tuning options in the JVM itself. Many parameters can be tuned, for example the size of the Java heap or code cache, or the thresholds determining when various activities are performed, such as garbage collections. No set of tuning options is optimal for every application - which options would be most beneficial will depend on the de facto behavior of the application at runtime.

Garbage Collection logs generated by a JVM contain valuable information about the nature of an application, such as its GC events, pause times, object allocation and promotion rate, and live-set information. These data fields can be easily extracted and engineered from GC logs. Interestingly, the data fields have relationships among them; changing values in one can impact the values in another. Leveraging this relationship, machine learning models can be trained, and the trained models can then be used to make tuning recommendations in order to achieve optimal performance in Java applications.

This thesis proposes to investigate the usefulness of machine learning methods in extracting JVM tuning recommendations from GC logs, and possibly automatically applying those at runtime. The work would likely include benchmarking the results on a variety of benchmarks, as well as selecting appropriate ML methods for evaluation and implementing them in the OpenJDK source code.

Requirements

This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.

Low-Overhead Profiling for Adaptive Optimization

Summary: The JVM profiles the application under execution to optimize it according to its runtime behavior. The current profiling techniques incur a significant overhead in certain phases of the executed application. This thesis proposes designing and evaluating profiling techniques that trade profile accuracy for lower overhead, with the overall aim of improving the JVM's efficiency.

Description
Adaptive optimization in a managed runtime environment such as the JVM improves application performance by continuously monitoring its behavior and using the gathered information to guide just-in-time (JIT) compiler optimizations. Optimizations that benefit from this information include inlining, speculative devirtualization, instruction scheduling, spill code placement, etc. Information about the behavior of the program is commonly gathered by instrumentation-based profiling, where different points in the application are extended with code that records every method invocation, loop iteration, caller type on a call site, etc. This technique produces rich and accurate information and allows JIT compilers to focus their optimizations effectively. Unfortunately it incurs a significant overhead in the phases of the application execution in which it is enabled.

This thesis proposes designing and evaluating alternative profiling techniques, such as probabilistic profiling, with lower overhead than the JVM's traditional instrumentation-based profiling. The thesis studies the trade-off between profiling overhead and quality of the adaptive optimizations taken by a JIT compiler, and ideally identifies the types of profile information that are more amenable to low-overhead profiling techniques. A successful thesis has the potential of improving the efficiency of adaptive optimization in Java's main virtual machine.

Requirements

This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.

Automatic Dynamic Optimization of Remembered Set Containers in G1

Summary: Let the G1 collector automatically determine remembered set container options for reduced memory usage.

Description
In the JVM's default garbage collector, G1, remembered sets store locations of incoming references to a particular region of the heap. This data structure is basically an implementation of a sparse set of integers: the entire range of possible values is split into evenly sized areas. A top level concurrent hash table stores values in areas that are "in" the set in a so-called remembered set container. Such a container is represented, depending on the number of values to be stored in that area it covers, by different kinds of data structures, e.g. arrays, bitmaps, or even single special integers.

The remembered set implementation switches between containers on the fly depending on current occupancy of an area.

G1 currently sizes these containers statically, i.e. independent of actual distribution of values in a given remembered set. So a particular container has a fixed size being able to hold a fixed amount of values, eg. an "array" remembered set container always has 128 entries, regardless of what the typical occupancy of such an array container is. This wastes memory, because different types of applications (and remembered sets for different areas of the heap) exhibit different occupancy characteristics.

The task is to change G1 to let it reconfigure the remembered set containers based on statistics that need to be gathered while an application is running to optimize its memory footprint, and evaluate the effectiveness of these optimizations on several benchmarks.

Requirements

This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.

G1 garbage collector Full GC improvements

Summary: Make G1 Full GC Faster and More Versatile.

Description
In JDK 10 the Hotspot G1 garbage collector received a parallel full-heap collector. It uses a parallelized mark-sweep-compact algorithm. While its performance is on par with the Parallel GC Full GC algorithm, there are opportunities to improve the algorithm related to work distribution, exploiting pre-existing work and handling various edge cases better.

Some of these improvements could include

Requirements

This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.

Handling Irreducibility in a Just-In-Time Compiler

Summary: This thesis studies approaches to make the graph-based internal representation of C2 (JVM's highly optimizing just-in-time compiler) more maintainable, accessible, and amenable to optimizations.

Description
Most compilers represent the code under compilation as a control-flow graph (CFG). A key property of a CFG is whether it is reducible, i.e. it does not contain any jump from outside a loop into the middle of it. Irreducible CFGs arise from the use of unstructured language constructs such as C's 'goto' or Kotlin coroutines, from the application of on-stack replacement techniques in virtual machines, and from compiler transformations such as tail call optimization.

Irreducible CFGs make code analysis and optimization more complex, computationally expensive, and error-prone. For this reason, many compilers transform irreducible CFGs into a reducible form, applying techniques such as node splitting or dispatching, before further analyzing and optimizing the code under compilation.

C2 aims at handling all kinds of CFGs, which has resulted in tens of complex and expensive bugs over its lifetime. This thesis studies different approaches to restrict the structure of CFGs in C2, with the goal of improving its maintainability. Potential approaches include reducibility transformations at different compilation stages, falling back to other JVM execution modes in certain cases, or combinations of these. A successful study has the potential of making Java's main virtual machine more accessible and amenable to further optimizations.

Requirements

This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.

Humongous Object Aware Region Allocation in G1

Summary: Explore options to improve G1 humongous object inner and outer fragmentation.

Description
The Hotspot G1 garbage collector is a regional collector: the Java heap is strictly split into same-sized regions. Objects larger than a single region ("humongous objects") are allocated using separate contiguous sets of regions, and are unmovable for performance reasons. This poses a few problems, for example:

This project should explore options to mitigate one or both of these fragmentation problems. There are several ideas to change existing heap management strategies to improve either. Examples could include better region selection for evacuation and placement of humongous objects, automatic region level defragmentation efforts during garbage collection, over-provisioning of the reserved heap area, more aggressive attempts to reclaim humongous objects, and regular object allocation at the end of a humongous object, or a special humongous object allocation area like ZGC.

Requirements

This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.