Deciphering the stack trace
Joakim Nordström on February 12, 2021In this post we will be looking at crash logs, the hs_err
file, that is generated when the Java Virtual Machine crashes. Trying to find what is going wrong, and which component to blame, is important to understand how to interpret the crash log file. The focus will be on understanding the frames that make up the stack trace.
Photo by Pine Watt
Divide and crash
For this post I’ve created an example consisting of a Java loop, dividing a constant by a decreasing denominator, eventually resulting in a division by zero. The code takes a few twists and turns to get a more interesting stack trace. It consists of two Java classes: Dumpster
and Divider
; Dumpster
loops, and calls Divider.do_div()
for each iteration. Divider
makes the call to the native JNI library libdump.so
that performs the division.
An abbreviated version can be seen below. Full code with build and run instructions are available on my GitHub.
Java classes:
Native libdump library:
Running this program for a few loops will trigger the crash.
Besides the problematic frame, there are some clear text instructions on what has happened: “The crash happened outside the Java Virtual Machine in native code” and “See problematic frame for where to report the bug”.
The crash happened outside the Java Virtual Machine in native code.
More closely investigating the generated hs_err
file, and specifically the stack trace, the first character for each frame tells us what kind of code we are at. There is a hint to their meaning directly in the hs_err
file:
We will go through the stack trace starting at the top, with the problematic frame, trying to make some more sense of the helpful hint.
C=native code
The top frame, the problematic one, is prefixed with a C
. The C
tells us that this is native code. We see it crashes in the C standard library libc.so
, while executing the function div
. The next frame is also native code, executing the awkwardly named Java_jaokim_dumpster_Dumpster_native_1div_1call
-function in our own library libdump.so
.
j=interpreted
Before that, our Java classes Divider
and Dumpster
have been executing. They’ve been executing in interpreted mode, which the lowercase prefix j
tells us.
Vv=VM code
Our Java classes and methods are called after a few frames of internal native code; the ~StubRoutines::call_stub
prefixed with lowercase v
, and the libjvm
frames prefixed with uppercase V
. The lowercase indicates VM generated code, and the uppercase is code from the internal libjvm
library (this distinction is worthy a text of its own; it’s sufficient to understand that v
and V
is internal VM native code).
Execution of our program starts with the two bottom frames. As seen by the C
, this is native code from the libjli
Java launcher library. As the name suggests, this library launches the JVM, and is what the java
command uses to start our Java program. Although being Java specific, it doesn’t count as VM internal code, since it doesn’t execute in the JVM/HotSpot memory space.
J=compiled Java code
Let’s take a look at the uppercase J
frame type, “compiled Java code”. Using the same example, only increasing the number of loops to say a thousand iterations, we get a slightly different stack trace.
The crash log looks pretty much the same, apart from the Java frames for native_div_call
, and do_div_call
, now has an uppercase J
.
When we run our loop this many times, the JVM sees the method do_div
being a hotspot, and instead of interpreting it each time, it compiles it just-in-time for execution in order to save valuable clock cycles. The numbers (56 and 55) are the ID of the task that compiled the method, the number of bytes is the size of the generated code, and the addresses point to the compiled code in memory.
A=aot compiled Java code
In order to get full coverage of the different frame types, we have to add an ahead-of-time compiled step, using the AOT feature of JEP 295. This example shows how we are using a pre-compiled (i.e. ahead-of-time) version of the Divider
class; using the jaotc
command the class is compiled into the AOT-library divider.so
. We unlock experimental mode for Java and add the AOTLibrary
argument.
Looking at that crash, the Divider.do_div()
-frame is now prefixed with an uppercase A
showing we’re using the ahead-of-time compiled method.
Who caused the crash?
In this particular case, we know what happened; we made a division by zero. Even though the problematic frame was in the C standard library, and hs_err
file explicitly tells us to “See problematic frame for where to report the bug”, it’s rather unlikely that the well-used C standard library should crash this easily. So, instead of just blaming libc.so
, one should look at the other frames, and see what is going on. The next suspect on our list is our own native library. And, yes, it could be fixed here, by checking if the denominator
is zero before calling div.
However, instead of actually calling the native code at all, it would be better to just fix this in the Java loop, making sure i
never hits zero, by changing the “i >= 0
” to just “i > 0
”.
Summary
The hs_err
file has its snappy summary of the different stackframe types, and also gives helpful hints on where to report bugs and crashes.
However, we can enhance it a little into the table below, and hopefully, this post has given further background to its meaning.
Char | Description |
---|---|
j |
Java code interpreted by the JVM at runtime |
J |
Java code compiled just in time by the JVM |
A |
Java code compiled ahead-of-time, by jaotc |
C |
Native code from external library, not residing in the JVM |
V |
Native code from library part of the JVM |
v |
Native code genereted by the JVM |
Further reading
The Java troubleshooting guide is great if you need further assistance in troubleshooting a crashing Java application. This guide gives advices on how to determine what crashed, but also tips and possible workarounds.