Performance Improvements in JDK 25

In this article, we’ll take a walk through some of the many notable performance improvements and features in JDK 25. This is by no means exhaustive: JDK 25 comes with more than 3.200 issues fixed, of which almost a thousand were enhancements. About a hundred of these have been explicitly labeled as performance-related. We group the improvements into JDK Libraries, Garbage Collectors, Compiler, and Runtime, and list improvements in no particular order.

Enhancements in JDK Libraries

JEP 506: Scoped Values

The scoped values API was added as a preview feature in JDK 21 and has now been finalized in JDK 25.

Scoped values provide a novel mean to pass implicit parameters to any method transitively called via a ScopedValue. This may be used to improve performance and flexibility in applications that use ThreadLocal to similar effects today, as it allows robust and performant sharing of data across a large number of threads. Instead of duplicating state per thread in an error-prone way, the new mechanism allows shared access across any number of threads. This scales better because it has less memory overhead (no per-thread copies) and lower synchronization cost. ScopedValues can be especially beneficial when used in conjunction with virtual threads and structured concurrency.

Structured concurrency was first previewed in JDK 21 and is still a preview API in JDK 25 (see JEP 505). Among the updates for JDK 25 is that the API now neatly handles scoped values such that child tasks inherit scoped values:

    private static final ScopedValue<String> NAME = ScopedValue.newInstance();

    ScopedValue.where(NAME, "duke").run(() -> {
        try (var scope = StructuredTaskScope.open()) {

            // each child task can retrieve "duke" from NAME
            scope.fork(() -> childTask1());
            scope.fork(() -> childTask2());
            scope.fork(() -> childTask3());

            scope.join();
             ..
        }
    });

Nicolai Parlog recently covered this feature and much more in his talk at Devoxx Belgium 2025, Structured Concurrency in Action

JDK-8354300 Mark String.hash field @Stable

In JDK 25, String was improved to make the String::hashCode function constant foldable. This might lead to significant performance improvements in some common scenarios, such as using String constants as keys in a constant, unmodifiable Map:

private static final Map<String, Foo> MAP = Map.of("constant", value, ...);
        
MAP.get("constant").foo();

In this scenario, constant folding means the JIT can skip the map lookup entirely, replacing the lookup with a direct call to value.foo() or better. In a targeted microbenchmark such as StringHashCodeStatic.nonZero this brings us about an 8x speed-up.

In this article you can read more about the benchmark and additional implementation details. In short, the hash code for the string is now stored in a field that is marked with the internal @Stable annotation. This allows the JIT compiler to trust and constant fold the value if it is no longer the default zero value. While @Stable annotation is JDK-internal, a general-purpose approach is on the way through the next feature of our article.

JEP 502: Stable Values (Preview)

Using the StableValue API (previewed in JDK 25), anyone can declare a lazy constant that is implicitly stable, which means the JVM will treat it as a constant once a value has been lazily computed. Computation is done using a computing function provided at declaration time, and the constant is cached in a Supplier:

class OrderController {
    private final Supplier<Logger> logger = StableValue.supplier(() -> Logger.create(OrderController.class));

    void submitOrder(User user, List<Product> products) {
        logger.get().info("order started");
        ...
        logger.get().info("order submitted");
    }
}

In the example above, the field logger is of type Supplier<Logger>, but, at construction, no constant is yet initialized. When we invoke logger.get() the first time, the underlying computing function, () -> Logger.create(OrderController.class), is evaluated, causing the constant to be initialized.

Once the constant is initialized, the JIT compiler can trust that the constant will never change and is thus free to omit further reads of the constant, i.e., constant folding. This technique can provide significant performance improvements, including eliminating code. Effectively, this has the same impact as the @Stable annotation available in internal JDK code, but made available to library and application developers in a safe and enforced manner.

It’s prudent to note that preview APIs are subject to change. At the time of writing StableValue is set to be renamed to LazyConstant in JDK 26.

JDK-8345687 Improve the Implementation of SegmentFactories::allocateSegment

This Panama Foreign Function and Memory (FFM) enhancement speeds up allocating native memory segments by up to 2x. This is achieved by explicitly aligning memory, avoiding unnecessary merges and object allocation, improving zeroing, and a few other tricks. Better handling of shared memory in the java.lang.foreign component means better performance when interoperating with native libraries. FFM was finalized in JDK 22 with the delivery of JEP 454 and brings a promise of simplifying native integration while outperforming JNI. Per has blogged about this in more depth here.

JDK-8354674 AArch64: Intrinsify Unsafe::setMemory

This enhancement added an intrinsic to speed up Unsafe::setMemory, which is an API commonly used in I/O, desktop and Foreign Function and Memory(FFM) APIs. The supplied microbenchmark showcases a ~2.5x speed-up when writing chunks of data using java.lang.foreign.MemorySegment::fill.

Added Intrinsics for ML-KEM and ML-DSA API

JDK 24 added Quantum-Resistant Module-Lattice-Based Key Encapsulation Mechanism, or ML-KEM (JEP 496), and Quantum-Resistant Module-Lattice-Based Digital Signature Algorithm, or ML-DSA (JEP 497).

In JDK 25, the performance of many of these new APIs has doubled on AArch64 and Intel AVX-512 platforms thanks to specialized intrinsics. This puts the OpenJDK roughly on par with OpenSSL for many of these security operations on modern hardware.

JDK-8350748 VectorAPI: Method “checkMaskFromIndexSize” Should Be Force Inlined

In low-level libraries, fine-tuning how the JIT inlines can help ensure that some critical optimizations occur as expected. This enhancement addresses such an issue in the Vector API by forcing inlining to happen at a critical place, leading to a 14x speed-up on targeted benchmarks.

JDK-8350493 Improve Performance of Delayed Task Handling

The java.util.concurrent.ForkJoinPool was updated to implement ScheduledExecutorService to better deal with delayed tasks.

This major overhaul removes some locking-related bottlenecks and notably improves the performance of canceling delayed tasks (such as timeout handlers). Added some convenient methods such as submitWithTimeout too!

Other JDK Library Performance Enhancements and Bug Fixes

  • JDK-8356709 Avoid redundant String formatting in BigDecimal.valueOf(double)

    Gives a 6-9x speed-up of BigDecimal.valueOf for typical inputs.

  • JDK-8353686 Optimize Math.cbrt for x86 64 bit platforms

    Together with a follow-up fix (JDK-8358179) this x86 enhancement brings a 3x speed-up to Math.cbrt (cubic root). A similar change was attempted on AArch64 but failed to show any benefit there.

  • JDK-8357690 Add @Stable and final to java.lang.CharacterDataLatin1 and other CharacterData classes

    While currently unclear if this has any effects in the wild this could conceivably make some code constant foldable.

Garbage Collection Improvements

JEP 521 Generational Shenandoah

The generational mode of the Shenandoah garbage collector, introduced in JDK 24, has transitioned to a product feature in JDK 25.

JDK-8350441 ZGC: Overhaul Page Allocation

This major ZGC enhancement replaces the Page Cache in ZGC with a Mapped Cache, which improves how ZGC manages unused allocated memory. Implemented as a self-balancing binary search tree of contiguous memory ranges, the Mapped Cache merges memory ranges on insertion. Among other things, this reduces heap memory fragmentation. Another consequence of the overhaul is that ZGC no longer uses multi-mapped memory, which means that reported RSS usage will no longer look artificially inflated.

Joel Sikström has authored a comprehensive deep-dive about how ZGC allocates and manages memory with this large overhaul in focus here: How ZGC allocates memory for the Java heap.

JDK-8343782 G1: Use One G1CardSet Instance for Multiple Old Gen Regions

This one allows G1 to merge any old generation region’s remembered set with others, unlocking memory savings. On one GC stress test referenced in the PR, this reduces the peak memory used by remembered sets from 2Gb to 0.75Gb on a JVM with a 64GB heap. Or roughly 2% of the process total:

Plot of memory reserved by G1 remembered sets before and after the change, showing a roughly 60% decrease at peak.

JDK-8351405 G1: Collection Set Early Pruning Causes Suboptimal Region Selection

G1 will now better estimate the cost of collecting regions during Mixed GCs and skip those that would significantly impact pause times. The result is reduced pause time spikes, particularly toward the end of a Mixed GC cycle, improving overall application performance. Thomas Schatzl has written in-depth about these and other GC changes on his blog: JDK 25 G1/Parallel/Serial GC changes.

JDK-8357443 ZGC: Optimize Old Page Iteration in Remap Remembered Phase

This enhancement leverages a pre-existing optimization to find all old pages without scanning the entire page table, speeding up significant collections when many pages don’t need to be considered for collection.

While likely a modest improvement on typical usage, this can lead to significant speed-ups in some corner cases, such as when you manually trigger significant collections (e.g., using System.gc()) on large but relatively empty heaps. The PR shows speed-ups of up to 20x on a 4GB heap, and up to 900x on a 16TB heap for the worst-case setup.

Compiler Improvements

JDK-8343685 C2 SuperWord: Refactor VPointer with MemPointer

Part of a larger effort to improve auto-vectorization in C2. This is a technique where plain Java code is transformed by the JIT to use SIMD instructions, which can allow for large speed-ups.

This enhancement makes more patterns eligible for such optimization to great effect. In a recent talk at JVMLS, Emanuel Peter casually mentions how this snippet of code runs 33 times faster after this enhancement:

for (int i = 0; i < (int)a.byteSize(); i++) {
    byte v = a.get(ValueLayout.JAVA_BYTE, i + invarL);
    a.set(ValueLayout.JAVA_BYTE, i + invarL, (byte)(v + 1));
}

Emanuel has blogged extensively on auto-vectorization in the C2 compiler and this introduction might be a good starting point for a deep dive.

JDK-8307513 C2: Intrinsify Math.max(long,long) and Math.min(long,long)

This is another auto-vectorization improvement referenced in Emanuel’s Peter JVMLS talk. By giving some special treatment to Math.max and Math.min, those operations can be considered for auto-vectorization by C2.

In a clipping variant (which combines both Math.min and Math.max) in the provided MinMaxVector microbenchmark, we see 3-5x speed-ups on a range of platforms:

@Benchmark
public long[] longClippingRange(RangeState state) {
    for (int i = 0; i < state.size; i++) {
        state.resultLongs[i] = Math.min(Math.max(state.longs[i], state.lowestLong), state.highestLong);
    }
    return state.resultLongs;
}

Improving auto-vectorization allows clearly written, plain Java code to take advantage of SIMD capabilities on modern CPUs.

JDK-8347405 MergeStores with Reverse Bytes Order Value

In JDK 23, C2 was enhanced with a merge store optimization, which allows merging byte-by-byte stores into wider primitives in an efficient way. In JDK 25, this is further enhanced to enable such merges to happen also when bytes are stored in the reversed order. For example:

public void patchInt(int offset, int x) {
    byte[] elems = this.elems;
    elems[offset ] = (byte) (x >> 24);
    elems[offset + 1] = (byte) (x >> 16);
    elems[offset + 2] = (byte) (x >> 8);
    elems[offset + 3] = (byte) x;
}

Focused microbenchmarks have seen up to a 4x speed-up on such code.

On typical little-endian systems, such as Intel x64 or Aarch64, this means Java may emit data in network order (big-endian) at significantly greater speed than before.

JDK-8346664 C2: Optimize Mask Check with Constant Offset

This enhancement improves mask checks such as ((index + offset) & mask) == 0 when offset is constant. This enables more constant folding of some relatively common low-level expressions. While this was found and implemented to help specific Panama workloads, the optimization is generic and low-level.

Here’s a microbenchmark adapted from the one in the PR to avoid use of java.lang.foreign:

    long address = 4711 << 3L;

    @Benchmark
    public void itsOver9000() {
        for (long i = 0; i < 32768; ++i) {
            if (((address + ((i + 1) << 3L)) & 7L) != 0) {
                throw new IllegalArgumentException();
            }
        }
    }

Since these expressions can now properly constant fold, the JIT goes above and beyond. It sees that the check will always be true and ends up optimizing away the entire loop. Result: 10,000x faster.

The fastest code will always be the code we don’t have to run at all!

Other Compiler Performance Enhancements and Bug Fixes

  • JDK-8353359 C2: Or(I|L)Node::Ideal is missing AddNode::Ideal call

    This fixes a regression introduced in JDK 21 where expressions such as (a | 3) | 6 no longer constant folded as expected.

  • JDK-8353041 NeverBranchNode causes incorrect block frequency calculation

    Fixes an issue where performance of infinite loops could be adversely affected.

  • JDK-8317976 Optimize SIMD sort for AMD Zen 4

    Makes sure AMD Zen 4 and later processors use appropriate optimized array sort routines.

  • JDK-8351414 C2: MergeStores must happen after RangeCheck smearing

    Improves some cases where the aforementioned merge store optimization interacted poorly with another optimization by separating the optimization to a separate pass.

Runtime Improvements

JEP 515 Ahead-of-Time Method Profiling

Project Leyden aims to improve startup and warmup of Java applications while imposing as few limitations as possible. This is achieved by recording what an application is doing during a training run and saving that to an Ahead-of-Time cache for subsequent runs.

The AOT cache, introduced via JEP 483 in JDK 24, was extended in JDK 25 by JEP 515 to be able to collect method profiles during training runs. This enables the JVM to generate optimized native code immediately upon application startup, rather than having to wait for profiles to be collected. Hence, the warmup time is improved.

Some example programs start up 15-25% faster with this feature, compared to JDK 24 running a similarly trained application.

With this enhancement and JEP 514: Ahead-of-Time Command-Line Ergonomics delivered in JDK 25, Project Leyden is coming along nicely.

JEP 519: Compact Object Headers

Compact object headers were added as an experimental feature in JDK 24 and have now been promoted to a product feature.

When enabled with -XX:+UseCompactObjectHeaders, all objects on the heap shrink by 4 bytes, typically. This adds up to substantial heap savings, and many benchmarks and real-world applications see significant speed-ups as a result due to improved cache locality and reduced GC activity.

Users are strongly encouraged to try this out and provide feedback. We are considering turning it on by default in a future release.

Various Interpreter Improvements

  • JDK-8356946 x86: Optimize Interpreter Profile Updates
  • JDK-8357223 AAarch64: Optimize Interpreter Profile Updates
  • JDK-8357434 x86: Simplify Interpreter::profile_taken_branch

Together, these improvements significantly enhance how the interpreter updates profile counters. During startup and warmup, the bytecode interpreter profiles which methods and branches are used, which informs the JVM on what to compile and how. The act of updating these counters can be noticeable during application startup and warmup.

In isolation, such optimizations are often lost in the noise, but over a release, they can and do add up. Going from JDK 24 to 25, the time to run a simple “Hello World!” has dropped from ~28.7ms to ~25.5ms in JDK 25. A neat 12% speed-up right off the bat.

While users of Project Leyden may unlock greater wins at scale, we think it’s great to see measurable improvements for free.

Other Runtime Performance Enhancements and Bug Fixes

  • JDK-8355646: Optimize ObjectMonitor::exit

    This synchronization improvement allows us to immediately unpark waiting threads without releasing and reacquiring a lock. This reduces latency on some lightly contended locks.

  • JDK-8348402 PerfDataManager stalls shutdown for 1ms

    Getting rid of a small sleep during JVM shutdown, for one. Good for all the short-running command line tools, build systems, etc.

  • JDK-8241678 Remove PerfData sampling via StatSampler

    A nice cleanup removing a periodically running task

  • JDK-8353273 Reduce number of oop map entries in instances
  • JDK-8354560 Exponentially delay subsequent native thread creation in case of EAGAIN
  • JDK-8352075 Perf regression accessing fields

    Fixes an issue from JDK 21 which could cause significant interpreter slowdowns on classes with many fields.

That’s All, Folks!

JDK 25 has been generally available for a while now, so don’t hesitate to try it out. If you would like to learn how JDK 25 compares to JDK 21 in terms of performance, we also recommend the session From JDK 21 to JDK 25 - Java Performance Update 2025 presented at Devoxx Belgium 2025.

As you test and migrate your applications, measure how your application performs on JDK 25 versus your current JDK. Did you notice anything that might have regressed? Get involved in the community and let us know! Join and raise an issue on the relevant mailing list.

We are already looking at a healthy set of improvements in the upcoming JDK 26 release and look forward to writing about those in some detail in spring 2026.

Until then… Stay on the fast path!