Thesis and Internship Proposals

Master Thesis Proposals

Java is one of the most popular programming languages in the world and it runs on billions of devices scaling from credit cards to multi-machine servers. Oracle is the main contributor to the Java programming language, developed through the OpenJDK project. The Java Virtual Machine (JVM) is the core piece of technology that enables Java’s “write once, run anywhere” - the ability to run the same Java program on multiple hardware architectures and operating systems without having to recompile the code. The JVM also implements the Java memory management with a garbage collector that handles all the details for you, and just-in-time (JIT) compilers that enable Java performance to be better than what is possible with any statically compiled language.

The Oracle development office in Stockholm hosts a large part of the JVM development team. We have world leading expertise in areas such as garbage collection, managed runtimes, and compilers. The thesis work will be performed on-site in the Stockholm office in collaboration with Oracle experts and academic researchers who are doing scholarly work on the JVM. For the prospective thesis student, this means the availability of both expert advice on the JVM domain as well as expert advice on writing, and the academic discipline.

To apply, or for more information, contact jesper.wilhelmsson@oracle.com.

Current Project Proposals

Universal GC Barriers for Ahead-of-Time Java Compilation

Summary: To reduce application warm-up overhead, the JVM can generate machine code offline that is just loaded and executed in production runs. Currently, this requires using the same garbage collector in the offline and production runs. This thesis proposes designing and evaluating techniques that make it possible to generate GC-agnostic machine code, with the aim of making offline ("ahead-of-time") code generation more widely applicable.

Description

The JVM employs just-in-time compilation to generate machine code while the application is running, adapting to the dynamic characteristics of the application. This approach gives excellent peak performance, but it incurs a significant overhead that is particularly noticeable during the initial phase of the application's execution. To mitigate this overhead, it is possible to compile parts of the application offline ("ahead-of-time") and then load and execute the machine code directly in a production run.

A current limitation of ahead-of-time Java compilation is that the same garbage collector (GC) needs to be used in the offline and production runs. This is because the generated machine code contains GC barriers (small fragments of code that record information about application memory accesses) that are tailored for a specific GC. Besides forcing users to select the exact same GC in offline and production runs, this limitation precludes applying ahead-of-time compilation to the JDK code itself.

This thesis proposes designing and evaluating GC barriers that are agnostic to the GC in use. Such GC barriers would make it possible to offline generate machine code that can be readily loaded and executed in production regardless of which GC is selected. A successful thesis has the potential of reducing the initial overhead of Java applications and making Java an (even more) attractive language for interactive applications and modern cloud deployments.

Requirements

Must be able to work in C++
Must be familiar with basic assembly programming
Compiler and garbage collection knowledge is a plus but not required

This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.

C2 Formalization

Summary: This thesis project investigates formalizing the intermediate representation of the HotSpot top-tier just-in-time compiler and validating selected compiler optimizations in a theorem prover.

Description

Correctness of compilers can be established using multiple approaches: random testing, full formal verification of a compiler, or verification of individual compilations ("Translation Validation"). Random testing has been successful in discovering test programs that expose bugs in many compiler projects, but the approach is limited to testing a compiler for fixed inputs and templated programs.

This project would entail formalizing part of the top tier HotSpot VM just-in-time (C2) graph-based intermediate representation ("Sea of Nodes") in a theorem prover, and validating optimizations over this representation, to give further assurance of compilation correctness.

One possible research direction is converting the intermediate representation output between C2 compiler phases/transformations to the format of an automated theorem prover, where soundness of transformations can be automatically validated.

Requirements

Working proficiency in C++ / Java
Must be familiar with basic compiler design
Must have an interest in formal methods

This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.

Automatic Dynamic Optimization of Remembered Set Containers in G1

Summary: Let the G1 collector automatically determine remembered set container options for reduced memory usage.

Description
In the JVM's default garbage collector, G1, remembered sets store locations of incoming references to a particular region of the heap. This data structure is basically an implementation of a sparse set of integers: the entire range of possible values is split into evenly sized areas. A top level concurrent hash table stores values in areas that are "in" the set in a so-called remembered set container. Such a container is represented, depending on the number of values to be stored in that area it covers, by different kinds of data structures, e.g. arrays, bitmaps, or even single special integers.

The remembered set implementation switches between containers on the fly depending on current occupancy of an area.

G1 currently sizes these containers statically, i.e. independent of actual distribution of values in a given remembered set. So a particular container has a fixed size being able to hold a fixed amount of values, eg. an "array" remembered set container always has 128 entries, regardless of what the typical occupancy of such an array container is. This wastes memory, because different types of applications (and remembered sets for different areas of the heap) exhibit different occupancy characteristics.

The task is to change G1 to let it reconfigure the remembered set containers based on statistics that need to be gathered while an application is running to optimize its memory footprint, and evaluate the effectiveness of these optimizations on several benchmarks.

Requirements

Must be able to work in C++
Must be familiar with the concept of garbage collection
Must understand synchronization primitives such as locks, mutex, cas

This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.

G1 garbage collector Full GC improvements

Summary: Make G1 Full GC Faster and More Versatile.

Description
In JDK 10 the Hotspot G1 garbage collector received a parallel full-heap collector. It uses a parallelized mark-sweep-compact algorithm. While its performance is on par with the Parallel GC Full GC algorithm, there are opportunities to improve the algorithm related to work distribution, exploiting pre-existing work and handling various edge cases better.

Some of these improvements could include

the G1 full collection algorithm currently works in a vacuum: it performs all work from scratch, ignoring that there may be existing liveness information from a concurrent mark that this full gc just interrupted. This can decrease the duration of the full collection significantly.
to combat fragmentation of large objects, the full collection algorithms should also move these large objects as a last resort.
the compaction phase of the full collection algorithm is sometimes not well balanced: the algorithm distributes work units across threads without considering the actual amount of work in the work units, causing imbalance. Another problem can be that this work is distributed for the entire compaction phase that itself consists of several sub-phases that have different cost per work unit.
scale the number of worker threads per phase: currently thread scaling for the full collection algorithm is determined on a global basis. However, different parts of the full collector can be faster if the number of threads in used is scaled to the work.

Requirements

Must be able to work in C++
Must be familiar with the concept of garbage collection
Must understand synchronization primitives such as locks, mutex, cas

This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.

Handling Irreducibility in a Just-In-Time Compiler

Summary: This thesis studies approaches to make the graph-based internal representation of C2 (JVM's highly optimizing just-in-time compiler) more maintainable, accessible, and amenable to optimizations.

Description
Most compilers represent the code under compilation as a control-flow graph (CFG). A key property of a CFG is whether it is reducible, i.e. it does not contain any jump from outside a loop into the middle of it. Irreducible CFGs arise from the use of unstructured language constructs such as C's 'goto' or Kotlin coroutines, from the application of on-stack replacement techniques in virtual machines, and from compiler transformations such as tail call optimization.

Irreducible CFGs make code analysis and optimization more complex, computationally expensive, and error-prone. For this reason, many compilers transform irreducible CFGs into a reducible form, applying techniques such as node splitting or dispatching, before further analyzing and optimizing the code under compilation.

C2 aims at handling all kinds of CFGs, which has resulted in tens of complex and expensive bugs over its lifetime. This thesis studies different approaches to restrict the structure of CFGs in C2, with the goal of improving its maintainability. Potential approaches include reducibility transformations at different compilation stages, falling back to other JVM execution modes in certain cases, or combinations of these. A successful study has the potential of making Java's main virtual machine more accessible and amenable to further optimizations.

Requirements

Must be able to work in C++
Must be familiar with graph and algorithms theory
Must understand basic compiler theory

This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.

Humongous Object Aware Region Allocation in G1

Summary: Explore options to improve G1 humongous object inner and outer fragmentation.

Description
The Hotspot G1 garbage collector is a regional collector: the Java heap is strictly split into same-sized regions. Objects larger than a single region ("humongous objects") are allocated using separate contiguous sets of regions, and are unmovable for performance reasons. This poses a few problems, for example:

At the end of such a humongous region there is often a significant amount of space that is effectively wasted and unavailable for allocation, i.e. inner fragmentation.
Outer, region level fragmentation due to never moving these objects can cause unexpected Out-of-memory situations if there are not enough contiguous regions left for a given new allocation.

This project should explore options to mitigate one or both of these fragmentation problems. There are several ideas to change existing heap management strategies to improve either. Examples could include better region selection for evacuation and placement of humongous objects, automatic region level defragmentation efforts during garbage collection, over-provisioning of the reserved heap area, more aggressive attempts to reclaim humongous objects, and regular object allocation at the end of a humongous object, or a special humongous object allocation area like ZGC.

Requirements

Must be able to work in C++
Must be familiar with the concept of garbage collection
Must understand synchronization primitives such as locks, mutex, cas

This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.