Analyzing a stuck HotSpot C2 compilation

Vladimir Sitnikov
netcracker
Published in
9 min readJul 7, 2018

Once upon a time there was a microservice-based application that ended up consuming CPU for no apparent reason. The system was idle, yet it consumed CPU. The problem was not specific to a particular service, and it reproduced quite often.

Configuration: CentOS 7.4.1708, Oracle JDK 1.8.0_152-b16 running inside Docker inside OpenShift.

Note: this might be a known issue, and it might be solved by just upgrading to 1.8.0_172, however the purpose of the article is to show a way to approach similar issues.

I would like to thank Nitsan Wakart for helpful comments and corrections.

The first step is to ensure there’s a problem, so I just runtop:

top - 09:09:57 up 134 days, 3:09, 0 users, load average: 3.36, ...
Threads: 80 total, 4 running, 76 sleeping, 0 stopped, 0 zombie
%Cpu(s): 19.5 us, 3.8 sy, 0.1 ni, 67.0 id, 0.0 wa
PID USER PR VIRT RES SHR S %CPU %MEM TIME+ COMMAND
46 root 20 441124 345740 10512 S 198.0 1.1 5294:27 java
1 root 20 11796 1612 1212 S 0.0 0.0 0:00.03 bash

As top refreshes it becomes apparent that java process 46 is consuming a couple of cores. How do we know which action is consuming the CPU? It has to be caused by one thread or another, howevertop does not split CPU consumption per thread by default.

One can use SJK ttop by Alexey Ragozin to identify which threads consume CPU, allocate memory, but in this case top is good enough.

Per thread reporting can be enabled via top -H (that might depend on the OS):

-H : Threads toggle
Starts top with the last remembered ’H’ state
reversed. When this toggle is On, all individual threads
will be displayed. Otherwise, top displays a summation of
all threads in a process.

The new display is as follows:

PID USER PR    VIRT    RES   SHR S %CPU %MEM   TIME+  COMMAND 
68 root 20 441124 345740 10512 R 88.7 1.1 2328:34 java
71 root 20 441124 345740 10512 R 85.0 1.1 2708:41 java
66 root 20 441124 345740 10512 R 7.6 1.1 38:21.01 java

As we see (the results are quite consistent as top refreshes), there are two threads consuming high CPU: 68 and 61. The next step is to identify what are the threads doing. Luckily a thread dump contains thread ids, so we just need to grab one: jstack 46 :

...
"C2 CompilerThread3" #8 daemon prio=9 os_prio=0 tid=0xeb0f5c00 nid=0x44 runnable [0x00000000]
java.lang.Thread.State: RUNNABLE
"C2 CompilerThread6" #11 daemon prio=9 os_prio=0 tid=0xeb0fa800 nid=0x47 runnable [0x00000000]
java.lang.Thread.State: RUNNABLE

Note: top shows thread ids in decimal, and thread dump shows thread ids in hex (0x44 == 68, 0x47 == 71)

Unfortunately the stacktrace is empty, so it is not a regular “badly looping Java code”, but it looks like an issue with a JVM itself as “C2 CompilerThread” is related to a JIT compiler.

gdb

Let’s go deeper. In order to get native stacktrace we can use either jstack -m 46 or pstack utility. It turns out pstack is not installed, so I install gdb first:

yum install gdb

pstack 46 produces the following (the output is pretty much consistent across pstack invocations):

...
Thread 1 (process 68):
#0 0xf6abda90 in MemBarCPUOrderNode::Opcode() const () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#1 0xf6e8462c in MemNode::can_see_stored_value(Node*, PhaseTransform*) const () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#2 0xf6e84afb in LoadNode::Value(PhaseTransform*) const () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#3 0xf6f5368a in PhaseIterGVN::transform_old(Node*) () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#4 0xf6f51084 in PhaseIterGVN::optimize() () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#5 0xf6aece24 in Compile::Optimize() () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#6 0xf6aeea0a in Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool) () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#7 0xf6a43a4d in C2Compiler::compile_method(ciEnv*, ciMethod*, int) () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#8 0xf6af92b1 in CompileBroker::invoke_compiler_on_method(CompileTask*) () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#9 0xf6afa0c7 in CompileBroker::compiler_thread_loop() () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#10 0xf7031f48 in compiler_thread_entry(JavaThread*, Thread*) () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#11 0xf703cadf in JavaThread::thread_main_inner() () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#12 0xf703cc5b in JavaThread::run() () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#13 0xf6f0c179 in java_start(Thread*) () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#14 0xf77b8b2c in start_thread () from /lib/libpthread.so.0
#15 0xf76cf17e in clone () from /lib/libc.so.6

What does it tell us? JVM was trying to compile a method viaC2Compiler::compile_method and it got stuck there.

That gives us the first workaround: disable C2 compiler completely. We can do that by supplying -XX:TieredStopAtLevel=1 Java option. This option heals CPU consumption in idle state, so it validates our hypothesis that C2 causes the issue. However this workaround might not be valid and/or desirable for the production system as it disables the sophisticated C2 compiler, thus it might result in much lower application performance.

It turns out there are ways to exclude certain methods from being JIT compiled (see-XX:CompileCommand), so we can try to identify the bad method in question and add mark it for exclusion.

If we look closer, frame #7 has ciMethod* argument that looks like a method:

#7  0xf6a43a4d in C2Compiler::compile_method(ciEnv*, ciMethod*, int)

Let’s figure it out. In order to do that we can attach to the java process with a gdb.

Core dump

While GDB can be used on a live process, it makes sense to investigate in offline mode since OpenShift performs health-checks of the application and it can easily restart Docker container right in the middle of the debugging session. To go offline, we need to collect a core dump (that is the snapshot of the application memory)

# gcore 46
...
Saved corefile core.46

Then we export core.46 to a safe location. Core dump is not self-sufficient for the analysis, but original binaries are required as well. In this case we have Docker, so we can just pull the same image and use it for the investigation:

$ docker pull docker_image_name
$ docker exec -it --rm --mount type=bind,source="/folder/with/core",target=/u42 docker_image_name bash

It gets into the bash shell to the same environment, so we can play with GDB:

# gdb /usr/java/jdk1.8.0_152/jre/bin/java /app/core.dump
...
[New LWP 231]
[New LWP 46]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/etc/alternatives/java'.
#0 0xf7737430 in __kernel_vsyscall ()
Missing separate debuginfos, use: debuginfo-install jdk1.8-1.8.0_152-fcs.i586
(gdb)

We need to find a thread and check its variables:

(gdb) info threads
...
20 Thread 0xeee7ab40 (LWP 66) 0xf7737430 in __kernel_vsyscall ()
19 Thread 0xeeefbb40 (LWP 65) 0xf6dc767b in MemNode::can_see_stored_value(Node*, PhaseTransform*) const () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
18 Thread 0xeef7cb40 (LWP 64) 0xf7737430 in __kernel_vsyscall ()
17 Thread 0xeeffdb40 (LWP 63) 0xf7737430 in __kernel_vsyscall ()

Ah-ha! Here we go:

(gdb) thread 19
[Switching to thread 19 (Thread 0xeeefbb40 (LWP 65))]
#0 0xf6dc767b in MemNode::can_see_stored_value(Node*, PhaseTransform*) const () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
(gdb) bt
#0 0xf6dc767b in MemNode::can_see_stored_value(Node*, PhaseTransform*) const () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#1 0xf6dc7afb in LoadNode::Value(PhaseTransform*) const () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#2 0xf6e9668a in PhaseIterGVN::transform_old(Node*) () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#3 0xf6e94084 in PhaseIterGVN::optimize() () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#4 0xf6a2fe24 in Compile::Optimize() () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#5 0xf6a31a0a in Compile::Compile(ciEnv*, C2Compiler*, ciMethod*, int, bool, bool, bool) () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#6 0xf6986a4d in C2Compiler::compile_method(ciEnv*, ciMethod*, int) () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#7 0xf6a3c2b1 in CompileBroker::invoke_compiler_on_method(CompileTask*) () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#8 0xf6a3d0c7 in CompileBroker::compiler_thread_loop() () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#9 0xf6f74f48 in compiler_thread_entry(JavaThread*, Thread*) () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#10 0xf6f7fadf in JavaThread::thread_main_inner() () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#11 0xf6f7fc5b in JavaThread::run() () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#12 0xf6e4f179 in java_start(Thread*) () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
#13 0xf76fbb2c in start_thread () from /lib/libpthread.so.0
#14 0xf761217e in clone () from /lib/libc.so.6

Then we just need to select a thread frame and check its arguments. Let’s go with frame #6:

(gdb) frame 6
#6 0xf6986a4d in C2Compiler::compile_method(ciEnv*, ciMethod*, int) () from /usr/java/jdk1.8.0_152/jre/lib/i386/server/libjvm.so
(gdb) info args
No symbol table info available.

What? It looks like gdb can’t get the debug information, so it cannot tell where (in which CPU registers) the method arguments are stored.

What do we do with “no symbol table info available”? Well, do you remember the message that gdb printed at the very beginning?

Missing separate debuginfos, use: debuginfo-install jdk1.8-1.8.0_152-fcs.i586

Let’s do it!

Loaded plugins: fastestmirror, ovl
enabling base-debuginfo
Loading mirror speeds from cached hostfile
* base: mirror.yandex.ru
* extras: ftp.nsc.ru
* updates: mirror.yandex.ru
Could not find debuginfo for main pkg: 2000:jdk1.8-1.8.0_152-fcs.i586
No debuginfo packages available to install

Unfortunately, I was not able to make that fly, so we should do something without debug information.

VMStructs

One of the parts of debug information is struct layout (the size of the structures, the types of the fields and its offsets). It turns out HotSpot JVM has that information by default even in regular release builds. It is called VMStructs and it is used by the tools like jstack You can find more on VMStructs in Andrei Pangin’s VMStructs: how an application may benefit from knowing JVM internals talk and his wonderful helfy tool.

Let’s get our hands dirty. As you remember, we do not know argument values, however we can assume the arguments are stored either in registers or at the stack. We can just dump both and try to find something looking like a ciMethod

(gdb) info registers
eax 0xe7e4a57c -404445828
ecx 0x0 0
edx 0xe7fcae98 -402870632
ebx 0xf70d2f40 -150130880
esp 0xeeefa4f0 0xeeefa4f0
ebp 0xeeefab08 0xeeefab08
esi 0xffffffff -1
edi 0xf70cdf08 -150151416
eip 0xf6986a4d 0xf6986a4d <C2Compiler::compile_method(ciEnv*, ciMethod*, int)+541>
eflags 0x293 [ CF AF SF IF ]
cs 0x23 35
ss 0x2b 43
ds 0x2b 43
es 0x2b 43
fs 0x0 0
gs 0x63 99
(gdb) x/10x $esp
0xeeefa4f0: 0xeeefa55c 0xeeefacf4 0xef2ab878 0x096fe388
0xeeefa500: 0xffffffff 0x00000001 0x00000001 0x00000001
0xeeefa510: 0x0120ba85 0xeeefab20

How do we know which is which? Of course we start HSDB debugger (or jhsdb in Java 9). For more information on that, you might like Volker SimonisAnalyzing HotSpot Crashes talk.

This time I’ll use command line version of the tool:

# /usr/java/jdk1.8.0_152/bin/java -classpath /usr/java/jdk1.8.0_152/lib/sa-jdi.jar sun.jvm.hotspot.CLHSDB
hsdb>

Then we tell it to attach to our core file:

attach /usr/java/jdk1.8.0_152/jre/bin/java /app/core.dump
Opening core file, please wait...
Error: java.lang.RuntimeException: can't load JavaScript engine

Great! We just miss some JavaScript there. It is not clear why JavaScript was missing in my Docker image, however I was able to get through by just copying relevant (the one from 1.8u152)nashorn.jar into /usr/java/jdk1.8.0_162/jre/lib/ext folder:

# /usr/java/jdk1.8.0_152/bin/java -classpath /usr/java/jdk1.8.0_152/lib/sa-jdi.jar sun.jvm.hotspot.CLHSDB
hsdb> attach /usr/java/jdk1.8.0_152/jre/bin/java /app/core.dump
Opening core file, please wait...
hsdb>

Inspecting the core via CLHSDB

It does not take much effort to just brute-force all the values stored in the registers and on the stack:

hsdb> inspect 0xe7e4a57c
Type is ConNode (size of 40)
Node** Node::_in: Node* @ 0xe7e4a5a4
Node** Node::_out: Node* @ null
node_idx_t Node::_cnt: 1
node_idx_t Node::_max: 1
node_idx_t Node::_outcnt: 0
node_idx_t Node::_outmax: 0
node_idx_t Node::_idx: 1
jushort Node::_class_id: 4
jushort Node::_flags: 16
hsdb> inspect 0xe7fcae98
hsdb> inspect 0xf70d2f40
hsdb> inspect 0xeeefa4f0
hsdb> inspect 0xeeefab08
hsdb> inspect 0xf70cdf08
hsdb> inspect 0xeeefa55c
hsdb> inspect 0xeeefacf4
hsdb> inspect 0xef2ab878
hsdb> inspect 0x096fe388
Type is ciMethod (size of 100)
uint ciBaseObject::_ident: 1670
Metadata* ciMetadata::_metadata: Metadata @ 0xedaab218
int ciMethod::_interpreter_invocation_count: 31298
int ciMethod::_interpreter_throwout_count: 0
int ciMethod::_instructions_size: -1

Ah-ha! We have a ciMethod, however it does not look like anything to us. Where is the name?

Let’s check if ciMethod has relevant fields to get the name. Of course, we can clone the JDK sources or we can just Google for ciMethod.hpp and check a random repository:

class ciMethod : public ciMetadata {
...
private: // General method information.
ciFlags _flags;
ciSymbol* _name;
ciInstanceKlass* _holder;
ciSignature* _signature;

It looks like those fields are not in VMStructs so let’s just brute force them. As we know, 0x096fe388 points to a ciMethod instance, so we should check its contents:

hsdb> examine 0x096fe388/40
0x096fe388: 0xf70bf108 0x00000686 0xedaab218 0x81000009 0x096fe3f0 0x096fe328
0x096fe3a0: 0x096fe410 0x096fe550 0x096ff248 0x00000031 0x00000005 0x00000003
0x096fe3b8: 0x00000000 0x00000000 0x00007a42 0x00000000 0xffffffff 0x00000002
0x096fe3d0: 0x01010100 0x00000001 0x096ff378 0x096ff3b0 0x09700320 0x096ff200

Then we check each value one by one:

hsdb> inspect 0xf70bf108
hsdb> inspect 0x00000686
hsdb> inspect 0xedaab218
Type is Method (size of 48)
ConstMethod* Method::_constMethod: ConstMethod @ 0xedaab188
MethodData* Method::_method_data: MethodData @ 0xeaa9cef8
MethodCounters* Method::_method_counters: MethodCounters @ 0xed6e5f50
AccessFlags Method::_access_flags: -2130706423
int Method::_vtable_index: -2
u2 Method::_method_size: 12
u1 Method::_intrinsic_id: 0
nmethod* Method::_code: nmethod @ null
address Method::_i2i_entry: address @ 0xedaab234
AdapterHandlerEntry* Method::_adapter: AdapterHandlerEntry @ 0xf656d660
address Method::_from_compiled_entry: address @ 0xedaab23c
address Method::_from_interpreted_entry: address @ 0xedaab244
hsdb> inspect 0x81000009
hsdb> inspect 0x096fe3f0
Type is ciSymbol (size of 16)
Symbol* ciSymbol::_symbol: Symbol @ 0xee1ef060
hsdb> inspect 0x096fe328
Type is ciInstanceKlass (size of 80)
uint ciBaseObject::_ident: 1664
Metadata* ciMetadata::_metadata: Metadata @ 0xedaabb40
BasicType ciType::_basic_type: 12
ciSymbol* ciKlass::_name: ciSymbol @ 0x096fe378
InstanceKlass::ClassState ciInstanceKlass::_init_state: 4
bool ciInstanceKlass::_is_shared: 0

Well, it looks like 0x096fe3f0 points to the method name and 0x096fe328 points to the class. Let’s dump those as well:

hsdb> inspect 0x096fe3f0
Type is ciSymbol (size of 16)
Symbol* ciSymbol::_symbol: Symbol @ 0xee1ef060
hsdb> inspect 0xee1ef060

Bad luck. hsdb does not show meaningful information here. It turns out we need to use symbol command to inspect symbols:

hsdb> symbol 0xee1ef060
#forMethodParameter
hsdb> inspect 0x096fe378
Type is ciSymbol (size of 16)
Symbol* ciSymbol::_symbol: Symbol @ 0xee0a0e80
hsdb> symbol 0xee0a0e80
#org/springframework/core/ResolvableType

Great. Now we know that the issue was caused by C2 trying to compile org.springframework.core.ResolvableType#forMethodParameter method. We can add the method to exclusion list via the following Java option

-XX:CompileCommand=exclude,org.springframework.core.ResolvableType::forMethodParameter

Note: CompileCommand=exclude excludes the method completely, so the method will be interpreted. It might affect the performance of the application, thus CompileCommand should be treated as a workaround.

There are lots of methods forMethodParameter, and we can identify the “bad” overload for the fun’s sake. Method signature comes right after ciInstanceClass*, so 0x96fe410 is the signature:

hsdb> inspect 0x096fe410
hsdb>
hsdb> whatis 0x096fe410
Address 0x096fe410: In unknown location

Here’s the source for ciSignature:

class ciSignature : public ResourceObj {
ciSymbol* _symbol;
ciKlass* _accessing_klass;
GrowableArray<ciType*>* _types;
int _size; // number of stack slots required for arguments
int _count; // number of parameter types in the signature

The dump of 0x96fe410 reveals that the method has 2 arguments (_size and _count are both equal to 2):

hsdb> examine 0x96fe410/5
0x096fe410: 0x096fe400 0x096fe328 0x096fe428 0x00000002 0x00000002
hsdb> inspect 0x096fe400
Type is ciSymbol (size of 16)
Symbol* ciSymbol::_symbol: Symbol @ 0xee1ef2b0
hsdb> symbol 0xee1ef2b0
#(Lorg/springframework/core/MethodParameter;Ljava/lang/reflect/Type;)Lorg/springframework/core/ResolvableType;

Conclusion

Command line debugging is not that fun while you debug, however it is just amazing which capabilities you get with things like VMStructs and HSDB.

--

--

Vladimir Sitnikov
netcracker

System performance engineer at @NetCrackerTech. Looking forward to tuning Java and SQL things.