A Deep Dive into JVM Start-up

When you start a Java application, there might be an inclination to believe that the only code being executed is the Java byte code passed to the JVM, i.e. the .class files that have been compiled by javac. The reality is that during start-up, the JVM goes through a complex series of steps to create a veritable universe for running a Java application. In this article we will walk through the steps the JVM goes through from $ java to printing Hello World. If you’d prefer video format, you can also watch this video on the Java YouTube channel:

Preamble

To keep this walk-through of JVM start-up from venturing into “boil the ocean” territory, there are a few constraints I will be using to describe the process:

  1. I will be describing the JVM start-up process as it behaves with JDK 23. You can see the JVM specification for Java SE 23 here.
  2. I will be using the HotSpot JVM implementation as my example. This is by far the most widely used JVM implementation, with many popular JDK distributions using the HotSpot JVM or a derivative of it. Alternative JMV implementations might have a slightly different internal behaviour.
  3. Finally the primary code example I will be using to describe the JVM start-up process will be HelloWorld, even though this is as simple an application as you can write, it will still exercise all the key areas of the JVM start-up process.

Despite these constraints, upon reading this article, you should have a reasonably comprehensive understanding of the processes the JVM goes through during start-up and why they are necessary. This knowledge could be helpful when debugging your application should issues occur during start-up, and in some niche cases improving start-up performance. Though we will cover this a bit more towards the end of the article.

JVM Initialization

When a user executes the java command, this starts the JVM start-up process by calling the JNI (Java Native Interface) function JNI_CreateJavaVM(), you can see the code for this function here. This JNI function performs several important processes itself.

Validating User Input

The first step in the JVM start-up process is validating the user input: JVM arguments, artifact to be executed, and classpath. The below logging output shows this validation process occurring:

[arguments] VM Arguments:
[arguments] jvm_args: -Xlog:all=trace 
[arguments] java_command: HelloWorld
[arguments] java_class_path (initial): .
[arguments] Launcher Type: SUN_STANDARD

💡Note: You can see this logging as well by using in the JVM arg -Xlog:all=trace.

Detecting System Resources

After validating the user input, the next step is to detect available system resources: processors, system memory, and system services that the JVM might use. The availability of system resources could impact decisions the JVM makes based on its internal heuristics. For example, the garbage collector the JVM selects by default will depend on the availability of CPU and system memory, however many of the JVM’s internal heuristics can be overridden through the use of explicit JVM arguments.

[os       ] Initial active processor count set to 11
[gc,heap  ]   Maximum heap size 9663676416
[gc,heap  ]   Initial heap size 603979776
[gc,heap  ]   Minimum heap size 1363144
[metaspace]  - commit_granule_bytes: 65536.
[metaspace]  - commit_granule_words: 8192.
[metaspace]  - virtual_space_node_default_size: 8388608.
[metaspace]  - enlarge_chunks_in_place: 1.
[os       ] Use of CLOCK_MONOTONIC is supported
[os       ] Use of pthread_condattr_setclock is not supported

Preparing the Environment

After understanding available system resources, the JVM will begin to prepare the environment. Here the HotSpot JVM implementation generates hsprefdata (HotSpot performance data). This data is used by tools like JConsole and VisualVM to inspect and profile a JVM. This data is typically stored in a system’s /tmp directory. The below is just one example of the JVM creating this profiling data, this it will continue for some time during start-up, concurrent with other processes.

[perf,datacreation] name = sun.rt._sync_Inflations, dtype = 11, variability = 2, units = 4, dsize = 8, vlen = 0, pad_length = 4, size = 56, on_c_heap = FALSE, address = 0x0000000100c2c020, data address = 0x0000000100c2c050

Choosing the Garbage Collector

An important step in the start-up of the JVM is the selection of the garbage collector (GC). Which GC is being used could have substantial impacts on the performance of an application. By default the JVM has two GCs it will choose from, Serial GC and G1 GC, unless otherwise directed.

As of JDK 23, the JVM will select the G1 GC by default, unless the system has less than 1792 MB of available system memory, and/or only a single processor, in which case Serial GC is selected. Of course other GCs might be available including: Parallel GC, ZGC, and others depending upon the specific JDK version and distribution you are using, each with their distinct performance characteristics and ideal workloads.

[gc           ] Using G1
[gc,heap,coops] Trying to allocate at address 0x00000005c0000000 heap of size 0x240000000
[os,map       ] Reserved [0x00000005c0000000 - 0x0000000800000000), (9663676416 bytes)
[gc,heap,coops] Heap address: 0x00000005c0000000, size: 9216 MB, Compressed Oops mode: Zero based, Oop shift amount: 3

CDS

Around this time the JVM will look for the CDS archive. CDS, Cached Data Storage (formerly Class Data Storage), is an archive of class files that have been pre-processed, which can improve the start-up performance of the JVM. We will cover how CDS improves JVM start-up performance in during the Class Linking section. However don’t commit “CDS” to memory, it’s on its way out, we will cover why when we look into the future of JVM start-up later.

[cds] trying to map [Java home]/lib/server/classes.jsa
[cds] Opened archive [Java home]/lib/server/classes.jsa.

Creating the Method Area

One of the JVM’s last initialization steps is creating the method area. This is a special off-heap memory location where class data will be stored as the JVM loads it. While the method area is not located in the JVM’s heap, the garbage collector still manages it. Class data stored in the method area is eligible for removal if the class loader associated with it is no longer in scope.

💡 Note: If you are using a HotSpot JVM implementation, the method area is referred to as metaspace.

[metaspace,map] Trying to reserve at an EOR-compatible address
[metaspace,map] Mapped at 0x00001fff00000000

Class Loading, Linking, and Initialization

Once the initial housekeeping steps have been completed, the real “meat” of the JVM start-up process begins which involves Class Loading, Linking, and Initialization.

While the JVM Specification describes these processes sequentially; sections 5.3-5.5, these processes, for a given class, won’t necessarily occur in that order on a HotSpot JVM. As noted at the bottom of the chart, Resolution, part of Class Linking, could occur at any point from before Verification to after Class Initialization. Some processes, like Class Initialization, might not technically occur at all. We will cover all of this in the upcoming sections.

Class Loading

Class Loading is covered under section 5.3 of the JVM specification. Class Loading is a three step process of: the JVM locating the binary representation of a class or interface, deriving the class or interface from it, and loading that information into the JVM method area - which again would be referred to as “metaspace” if you are using a HotSpot JVM implementation.

One of the great powers of the JVM, and why it has become such a widely used platform, is its ability to dynamically load classes, allowing the JVM to load classes that have been generated on-demand during the JVM’s runtime. This ability is used by many popular frameworks and tools, like for example Spring and Mockito. Indeed even the JVM itself performs on-demand code generation when using lambdas, as can be seen in the InnerClassLambdaMetafactory class.

The JVM allows for two ways of loading classes, either with the bootstrap class loader (5.3.1), or a custom class loader (5.3.2). The latter is a class that extends the java.lang.ClassLoader class. In practice custom class loaders would often be defined in a 3rd party library to support that library’s behavior.

During this article we will only focus on the bootstrap class loader, which is a special class loader written in machine code and is provided by the JVM. It is instantiated in the latter stages of JNI_CreateJavaVM().

To better understand the process of Class Loading, we need to take a look at HelloWorld as the JVM would see it:

public class HelloWorld extends Object {
	public static void main(String[] args){
		System.out.println(Hello World!);
	}
}

All classes, at some point, extend off of java.lang.Object. In order for the JVM to load HelloWorld, it first needs to load all the classes that HelloWorld explicitly and implicitly depends on. Let’s take a look at all the method signatures in java.lang.Object:

public class Object {
    public Object() {}
    public final native Class<?> getClass()
    public native int hashCode()
    public boolean equals(Object obj)
    protected native Object clone() throws CloneNotSupportedException
    public String toString()
    public final native void notify();
    public final native void notifyAll();
    public final void wait() throws InterruptedException
    public final void wait(long timeoutMillis) throws InterruptedException
    public final void wait(long timeoutMillis, int nanos) throws InterruptedException
    protected void finalize() throws Throwable { }
}

The two important methods, would be public final native Class<?> getClass() and public String toString() as both of these methods reference another class: java.lang.Class and java.lang.String respectively.

If we look at java.lang.String, it implements several interfaces:

public final class String
implements java.io.Serializable, Comparable<String>, CharSequence,
Constable, ConstantDesc

To load java.lang.String, all its implementing interfaces must first be loaded, if we look at the logging output, we will see that these classes are loaded in the order they are defined, with java.lang.String being loaded last:

[class,load] java.io.Serializable source: jrt:/java.base
[class,load] java.lang.Comparable source: jrt:/java.base
[class,load] java.lang.CharSequence source: jrt:/java.base
[class,load] java.lang.constant.Constable source: jrt:/java.base
[class,load] java.lang.constant.ConstantDesc source: jrt:/java.base
[class,load] java.lang.String source: jrt:/java.base

If we move over to java.lang.Class, we see it too implements several interfaces, as well as some of the same interfaces as java.lang.String: java.io.Serializable and java.lang.constant.Constable.

public final class Class<T> 
implements java.io.Serializable,GenericDeclaration,Type,AnnotatedElement,
TypeDescriptor.OfField<Class<?>>,Constable

If we look at the JVM logs, we see that the interfaces are once again loaded in the order they are defined before java.lang.Class is loaded, except for java.io.Serializable and java.lang.constant.Constable, as they had already been loaded while loading java.lang.String.

[class,load] java.lang.reflect.AnnotatedElement source: jrt:/java.base
[class,load] java.lang.reflect.GenericDeclaration source: jrt:/java.base
[class,load] java.lang.reflect.Type source: jrt:/java.base
[class,load] java.lang.invoke.TypeDescriptor source: jrt:/java.base
[class,load] java.lang.invoke.TypeDescriptor$OfField source: jrt:/java.base
[class,load] java.lang.Class source: jrt:/java.base

💡 Note: Generally the JVM follows a lazy strategy for its processes, in this case Class Loading. A class is typically only loaded when it is actively referenced by another class, but as java.lang.Object is the special root class of all Java classes, the JVM will eagerly load java.lang.Class and java.lang.String. If you look at the method signatures for java.lang.Class (JavaDoc) and java.lang.String (JavaDoc) you might notice many of the classes are not loaded when executing an application like HelloWorld. For example Optional<String> describeConstable() is never referenced, and so java.util.Optional is never loaded. This is an example of HotSpot’s lazy strategy in action, or I suppose not in action.

The process of Class Loading will continue through much of the rest of JVM start-up, and in a real world application, during much of the early part of that application’s lifecycle, before eventually settling down. In total the JVM will load some 450 classes in the HelloWorld scenario, and this is why I used the analogy of the JVM creating a universe when starting up, as it’s doing a lot of work.

Let’s continue diving into the universe of JVM start-up by looking at Class Linking.

Class Linking

Class Linking, covered under section 5.4 of the JVM specification, is one of the more complex processes as it includes three distinct sub-processes:

There are three other processes within Class Linking: Access Control, Method Overriding, and Method Selection, but are outside of the scope of this article.

Returning to the chart Verification, Preparation, and Resolution don’t necessarily occur in the order they will be covered in this article. Resolution could happen as early as before Verification, to as late as after Class Initialization.

Verification

Verification 5.4.1, is a process of ensuring the class or interface is structurally correct. This process might kick off the loading of other classes if needed, though classes loaded as a result aren’t required to be verified or prepared themselves.

Returning to the topic of CDS, in most normal situations, JDK classes will not actively go through the Verification step. This is because one of the benefits provided by CDS is that the classes contained within the archive have already been verified, reducing the work the JVM needs to do on start-up, and as a result improving start-up performance.

If you would like to learn more about CDS, check out my Stack Walker video on the subject, our articles on dev.java on CDS, or this article inside.java article on how to include your application’s classes in a CDS archive.

One class that does need to be verified would be HelloWorld, which we can see the JVM performing this in the logs:

[class,init             ] Start class verification for: HelloWorld
[verification           ] Verifying class HelloWorld with new format
[verification           ] Verifying method HelloWorld.<init>()V
[verification           ] table = { 
[verification           ]  }
[verification           ] bci: @0
[verification           ] flags: { flagThisUninit }
[verification           ] locals: { uninitializedThis }
[verification           ] stack: { }
[verification           ] offset = 0,  opcode = aload_0
[verification           ] bci: @1

Preparation

Preparation 5.4.2 handles the initialization of static fields in a class to their default values.

To better understand this, lets use this simple example class:

class MyClass {
  static int myStaticInt = 10; //Initialized to 0
  static int myStaticInitializedInt; //Initialized to 0
  int myInstanceInt = 30; //Not initialized
  static {
    myStaticInitializedInt = 20;
  }
} 

Which contains three integer fields: myStaticInt, myStaticInitializedInt, and myInstanceInt.

In this example both myStaticInt and myStaticInitializedInt would be initialized to 0, the default for a primitive int type.

While myInstanceInt isn’t initialized as it’s an instance field, not a class field.

We’ll cover a little later when the myStaticInt, myStaticInitializedInt fields are initialized to the values of 10 and 20.

Resolution

The goal of Resolution 5.4.3 is to resolve the symbolic references in the Constant Pool of a class for use by JVM instructions.

To better understand this we will use the javap tool. This is a standard JDK command-line tool for disassembling Java .class files. Running it with the -verbose option will give a view into how the JVM interprets the classes it is loading. Let’s run javap on MyClass:

$ javap –verbose MyClass
class MyClass {
  static int myStaticInt = 10; //Initialized to 0
  static int myStaticInitializedInt; //Initialized to 0
  int myInstanceInt = 30; //Not initialized
  static {
    myStaticInitializedInt = 20;
  }
} 

The result of this command is shown below:

Constant pool:
   #1 = Methodref          #2.#3          // java/lang/Object."<init>":()V
   #2 = Class              #4             // java/lang/Object
   #3 = NameAndType        #5:#6          // "<init>":()V
   #4 = Utf8               java/lang/Object
   #5 = Utf8               <init>
   #6 = Utf8               ()V
   #7 = Fieldref           #8.#9          // MyClass.myInstanceInt:I
   #8 = Class              #10            // MyClass
   #9 = NameAndType        #11:#12        // myInstanceInt:I
  #10 = Utf8               MyClass
  #11 = Utf8               myInstanceInt
  #12 = Utf8               I
  #13 = Fieldref           #8.#14         // MyClass.myStaticInt:I
  #14 = NameAndType        #15:#12        // myStaticInt:I
  #15 = Utf8               myStaticInt
  #16 = Fieldref           #8.#17         // MyClass.myStaticInitializedInt:I
  #17 = NameAndType        #18:#12        // myStaticInitializedInt:I
  #18 = Utf8               myStaticInitializedInt
  #19 = Utf8               Code
  #20 = Utf8               LineNumberTable
  #21 = Utf8               <clinit>
  #22 = Utf8               SourceFile
  #23 = Utf8               MyClass.java
{
  static int myStaticInt;
    descriptor: I
    flags: (0x0008) ACC_STATIC

  static int myStaticInitializedInt;
    descriptor: I
    flags: (0x0008) ACC_STATIC

  int myInstanceInt;
    descriptor: I
    flags: (0x0000)

  MyClass();
    descriptor: ()V
    flags: (0x0000)
    Code:
      stack=2, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: aload_0
         5: bipush        30
         7: putfield      #7                  // Field myInstanceInt:I
        10: return
      LineNumberTable:
        line 1: 0
        line 4: 4

  static {};
    descriptor: ()V
    flags: (0x0008) ACC_STATIC
    Code:
      stack=1, locals=0, args_size=0
         0: bipush        10
         2: putstatic     #13                 // Field myStaticInt:I
         5: bipush        20
         7: putstatic     #16                 // Field myStaticInitializedInt:I
        10: return
      LineNumberTable:
        line 2: 0
        line 6: 5
        line 7: 10
}

💡 Note: This output has been slightly truncated to remove metadata that isn’t relevant to this article.

There is a lot here, so let’s break it down and step through what all this means.

The below segment is MyClass’s (automatically generated) default constructor, which starts with the call to the default constructor of MyClass’s parent class java.lang.Object and then sets myInstanceInt to its assigned value of 30.

MyClass();
  descriptor: ()V
  flags: (0x0000)
  Code:
    stack=2, locals=1, args_size=1
       0: aload_0
       1: invokespecial #1 //Method java/lang/Object."<init>":()V
       4: aload_0
       5: bipush        30
       7: putfield      #7 //Field myInstanceInt:I
      10: return
    LineNumberTable:
      line 1: 0
      line 4: 4

💡 Note: No doubt you’ll have noticed the aload_0, invokespecial, bipush, putfield, and so on. These are JVM Instructions, opcode that the JVM uses to actually perform its work.

To the right of invokespecial and putfield, there are the numbers #1 and #7 respectively. These are references to MyClass’s Constant Pool 4.4. Which let’s take a closer at it:

Constant pool:
   #1 = Methodref          #2.#3          // java/lang/Object."<init>":()V
   #2 = Class              #4             // java/lang/Object
   #3 = NameAndType        #5:#6          // "<init>":()V
   #4 = Utf8               java/lang/Object
   #5 = Utf8               <init>
   #6 = Utf8               ()V
   #7 = Fieldref           #8.#9          // MyClass.myInstanceInt:I
   #8 = Class              #10            // MyClass
   #9 = NameAndType        #11:#12        // myInstanceInt:I
  #10 = Utf8               MyClass
  #11 = Utf8               myInstanceInt
  #12 = Utf8               I
  #13 = Fieldref           #8.#14         // MyClass.myStaticInt:I
  #14 = NameAndType        #15:#12        // myStaticInt:I
  #15 = Utf8               myStaticInt
  #16 = Fieldref           #8.#17         // MyClass.myStaticInitializedInt:I
  #17 = NameAndType        #18:#12        // myStaticInitializedInt:I
  #18 = Utf8               myStaticInitializedInt
  #19 = Utf8               Code
  #20 = Utf8               LineNumberTable
  #21 = Utf8               <clinit>
  #22 = Utf8               SourceFile
  #23 = Utf8               MyClass.java

Contained in the MyClass Constant Pool are all of its symbolic references. For the JVM to execute the invokespecial JVM instruction, it needs to resolve the linkage to the default constructor of java.lang.Object. Referring back to the Constant Pool, entries 1-6 provides the information needed to form this linkage.

💡 Note: <init> is a special method javac automatically generates for each constructor in a class.

This pattern is also repeated with the putfield which references Constant Pool entry 7, which when combined with entries 8-12 provides the necessary information to resolve the linkage for setting myInstanceInt. For more on the Constant Pool check out the section in the JVM specification on it.

The reason why the Resolution process can occur from before Verification, to after Class Initialization, is that it’s performed lazily, only when the JVM attempts to execute a JVM instruction in a class. Not all classes that are loaded will have a JVM instruction executed. For example the java.lang.SecurityManager class is loaded but never touched, as it’s on its way out. It’s also possible that there is nothing to initialize in a class, and it’s automatically marked as initialized by the JVM. Which on the subject of Class Initialization…

Class Initialization

Finally there is Class Initialization, covered under 5.5 of the JVM specification. Class Initialization, involves assigning a ConstantValue to static fields and executing any static initializers in a class if present. It is started when the JVM invokes any of new, getstatic, putstatic, or invokestatic JVM instructions on a class.

The initialization of a class is handled by the special no args method, void <clinit>, which like <init> is automatically generated by javac. The inclusion of the angle brackets (< >), is deliberate as they are not valid characters for a method name and thus prevents Java users from writing their own custom <init> or <clinit> methods.

It’s not a guarantee that a <clinit> method is always created, as it’s only needed if there are static initializers or fields within a class. If a class has neither, then <clinit> is not generated, and the JVM immediately marks the class as initialized if new is called on it, in effect skipping Class Initialization, and how Resolution can occur after Class Initialization.

As MyClass does have two static fields and a static initializer block, it does have a <clinit> method, which back to the output of javap:

  static {};
    descriptor: ()V
    flags: (0x0008) ACC_STATIC
    Code:
      stack=1, locals=0, args_size=0
         0: bipush        10
         2: putstatic     #13                 // Field myStaticInt:I
         5: bipush        20
         7: putstatic     #16                 // Field myStaticInitializedInt:I
        10: return
      LineNumberTable:
        line 2: 0
        line 6: 5
        line 7: 10

The structure of <clinit> resembles <init>, but without making a call to a parent class’ constructor and JVM instructions like putstatic are used instead of putfield.

Hello World!

Eventually the JVM will have done enough prep work to start executing the user code inside public static void main(), where or Hello World! message is located:

[0.062s][debug][class,resolve] java.io.FileOutputStream
... 
Hello World!

In total the JVM would load some 450 classes, and some subset of that would be linked and initialized as well. On my M4 MacBook Pro, as can be seen in the logs, the entire process took just 62 milliseconds, even while performing VERY heavy logging. You can see the full log on my GitHub here.

Project Leyden

This is actually a very exciting time for start-up on the JVM. While there have been continual refinements to the start-up process with every release of the JDK, starting with JDK 24 the first feature from Project Leyden will be merged into a mainline JDK release.

Project Leyden has the goals of reducing: start-up time, time-to-peak performance, and memory footprint, and is building upon and superseding the work done with CDS. As Project Leyden is integrated, CDS will give way to AOT, ahead-of-time. Project Leyden features will work through recording the JVMs behavior during a training run, storing that information in a cache, and then loading from that cache on subsequent start-ups. If you’d like to learn more about Project Leyden, be sure to check out this video on it.

The lead-off feature for Project Leyden will be JEP 483: Ahead-of-Time Class Loading & Linking. We already covered Class Loading and Linking in this article, so the benefit of performing that work ahead of time, instead of at start-up should be quite clear now.

Conclusions

As covered in this article the JVM start-up process is a complex process. The ability to respond to available system resources, provide the means to inspect and profile the JVM, dynamically load classes and more, does come with a fair amount of complexity overhead.

So what can be taken from this, beyond having a deeper understanding of the JVM? There are two worth pointing out, debugging and performance, though their applicability might also be somewhat narrow.

Debugging

The JVM start-up process is very reliable, and typically when errors do occur, the cause of them is often an obvious user mistake, or perhaps an issue in a 3rd party library. Hopefully a deeper understanding of what the JVM is attempting to do and why might provide some better guidance on those more persistent or difficult to understand start-up issues.

Performance Improvement

Another potential benefit is that with this knowledge you might find some small opportunities to improve the start-up performance of your application. Particularly with JEP 483 being integrated in JDK 24, moving forward Class Loading and Linking behavior could further improve start-up performance. Though I would also caution that in most use cases the 1st party code you write is often just a very small fraction of the code running on the JVM. Between the libraries, frameworks, and the JDK itself, often the code that makes up your application is just the tip of the iceberg.