Master Thesis Proposals
Java is one of the most popular programming languages in the world and it runs on billions of devices scaling from credit cards to multi-machine servers. Oracle is the main contributor to the Java programming language, developed through the OpenJDK project. The Java Virtual Machine (JVM) is the core piece of technology that enables Java’s “write once, run anywhere” - the ability to run the same Java program on multiple hardware architectures and operating systems without having to recompile the code. The JVM also implements the Java memory management with a garbage collector that handles all the details for you, and just-in-time (JIT) compilers that enable Java performance to be better than what is possible with any statically compiled language.
The Oracle development office in Stockholm hosts a large part of the JVM development team. We have world leading expertise in areas such as garbage collection, managed runtimes, and compilers. The thesis work will be performed on-site in the Stockholm office in collaboration with Oracle experts and academic researchers who are doing scholarly work on the JVM. For the prospective thesis student, this means the availability of both expert advice on the JVM domain as well as expert advice on writing, and the academic discipline.
To apply, or for more information, contact jesper.wilhelmsson@oracle.com.
Current Project Proposals
Detailed Study of Array Usage
Summary: Understanding how arrays are used in Java programs can help guide compiler optimizations and memory allocation. This in turn can lead to faster Java programs with lower memory requirements. Modifying the JVM to give more detailed information will help taking the next step towards better understanding.
Description
Arrays are available in some form in almost all imperative programming languages. Their compact representation and constant-time random access make them attractive to certain kinds of algorithms and programs. A previous study has examined how arrays are used in practice in Java and Scala by manually instrumenting existing programs to output logging information regarding array creation, access patterns, element types etc.
This thesis work aims to improve on that work by instead modifying the class loader of the JVM so that loaded classes are instrumented automatically. This means that any code running on the JVM can be investigated, including the Java standard library which was excluded in the original study. The work will also include developing a more efficient logging format and accounting for Java’s reflective capabilities and native methods on arrays.
The resulting artifact can be used to reproduce and improve upon the original study as well as expand the roster of investigated programs.
Requirements- Must be able to work in C++
- Must understand the basics of Java programming
- Understanding of Java byte code and class loading is a plus but not required
This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.
LLM-Guided Optimization and Testing of C2
Summary: Large Language Models (LLMs) have emerged as a promising approach to solving several problems across many disciplines, including programming language implementation and compilers. This thesis project focuses on leveraging LLMs to improve the C2 just-in-time compiler in the HotSpot Java Virtual Machine.
Description
The general goal of the thesis project is to improve the performance and reliability of C2. There are many possible applications of LLMs that could advance this goal. The project can therefore, to a large degree, align with the student's interests. Example topics include:
- Use LLMs to automatically generate or tune compiler heuristics (e.g., inlining) leveraging the vast amount of data, parameters, and source code available within HotSpot
- Use LLMs to guide fuzzing (a technique used to automatically generate tests) towards test cases that are more likely to reveal compiler bugs
- Must be able to understand and work in Java and C++
- Must be familiar with basic compiler design
- Must have a good understanding of LLMs
This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.
Simplified Weak References
Summary: This thesis project will explore the concept of References in Java. Here we refer to the Java class called Reference, not traditional C-style pointers. We aim to simplify the Java language by cleaning up some features that was introduced in the early days of Java and which has been recognized not to be in the best interest of the Java platform.
Description
The java.lang.ref.Reference class is used to implement many different flavours of non-strong references in Java. WeakReferences are implemented as subclasses of Reference, and provide users with weak semantics. FinalReference is another type of Reference used to implement finalizers, which are now recognized as a mistake, and are deprecated. Another type of reference is PhantomReference which is mostly subtly different w.r.t. finalization, which was a mistake. The last type of Reference is SoftReference, which was intended to be useful for implementing caches. However, it has by some been recognized as a failed experiment.
The golden nugget today seems to be WeakReference. It comes with support for 1) having the GC clear the referent if it is not strongly reachable, 2) sending a notification from the GC to run cleanup code. Regarding the GC triggered cleanup code, one area where it is useful, is masking bugs where a user has forgotten to close a native resource. However, it might also introduce bugs in innocent user code, unless it has been appropriately marked up with Reference.reachabilityFence().
This project is an exploration to see if the simplest case of having a weak reference with no GC callback can be implemented in a simpler and more efficient fashion by allowing reference fields to be declared as weak.
Requirements- Must be able to understand and work in Java and C++
- Must be familiar with garbage collection
- Must have a good understanding of low-level development
This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.
Java Numerics Tower Exploration
Summary: This project seeks to characterize and explore the different ways of mapping abstract algebra entities to classes in Java.
Description
Work is being done in Java numerics to allow limited operator overloading for numeric types. The intention is however not to allow operator overloading in general. Rather than exposing operator overloading to anyone that thinks it "looks cooler" than calling a method, it will be used specifically for math looking code.
To make sure operator overloading is exposed in a careful way, the investigation will involve looking at abstract algebra and mapping corresponding mathematical entities to Java entities, mapped to type classes. The exact mapping can be done in subtly different ways.
Requirements- Must be able to understand and work in Java and C++
- Must be familiar with basic language design
- Must have a good understanding of abstract algebra
This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.
Improving Translation Validation for C2
Summary: The C2 optimizing JIT compiler is crucial for the performance of the HotSpot Java virtual machine. To achieve good performance, C2 applies a large and constantly evolving set of complex optimizations, making it challenging to verify C2's correctness. This thesis project concerns an increasingly popular and powerful technique that formally verifies the correctness of individual compilations, called translation validation.
Description
Traditional compiler verification relies on tests to detect crashes or miscompilations, but cannot provide formal correctness guarantees. Translation validation bridges the gap between testing and complete formal verification (e.g., CompCert) by formally proving the correctness of individual compilations. The starting point for this project is the previously completed thesis "Translation Validation for the HotSpot C2 Just-in-Time Compiler" (https://urn.kb.se/resolve?urn=urn:nbn:se:kth:diva-368250), which established a prototype translation validation tool and corresponding random test generator. This project aims to extend and expand that work, with flexibility to pursue different directions depending on the student's interests. Possible topics:
- Extend the scope of the existing validator. For example, validate additional categories of operations in the C2 intermediate representation, or enable validation of programs with loops.
- Design, formalize, implement, and evaluate a new approach that integrates with C2 itself and validates individual transformations and/or phases during compilation.
- Improve the random generation of input programs to the validator.
- Must be able to understand and work in Java, C++, and functional programming languages (e.g., Haskell, OCaml, or Lean)
- Must be familiar with basic compiler design
- Must have an interest in and familiarity with formal methods
This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.
Exploring Automated Backporting for OpenJDK
Summary: Sustaining Engineering ensures that long-term support of products with large codebases, like Java, remain stable and reliable by diagnosing and resolving critical production issues. This thesis explores how automation can support this work by evaluating and extending existing backporting tools to assist in maintaining the HotSpot JVM codebase.
Description
Maintaining stability across long-term supported versions of a big code base like the JDK requires a continuous effort to backport bug fixes from newer development branches to older releases. This is a key responsibility of the Sustaining Engineering team. Currently, this process is largely manual and time-consuming, as older code often diverges significantly from the latest development branch. Automating parts of this process could greatly improve efficiency and reduce human error.
This thesis will take an automated patch backporting tool, FixMorph, originally developed for automating backports in the Linux kernel, as a starting point, and adapt it for the C++-based HotSpot JVM codebase. The work will involve evaluating its applicability to HotSpot, improving its accuracy in identifying and aligning changes, and integrating basic automation for generating and testing candidate backports.
The thesis will analyze the effectiveness in reducing manual effort and enhancing the reliability and consistency of maintenance operations, offering insights into the potential of automated backporting for large, evolving C++ systems.
Requirements- Must have a solid understanding of C++ and software debugging
- Must be familiar with Git and version control workflows
- Basic knowledge of Java or the JVM ecosystem is desired
This is not an exhaustive list of requirements. Getting in touch with us early is a good way to identify any knowledge gaps that must be filled prior to project start to avoid delays.