Is lock contention on your JVM’s performance critical path?

A general rule to follow is that any JVM with 5% or more of its clock cycles in voluntary context switch are symptoms of lock contention.

The “cs” column in vmstat (on linux) gives us the total number of context switches, and pidstat (pidstat -w -I -p ) gives the break down — voluntary context switches (cswch/s) and involuntary context switches (nvcswch/s). Involuntary context switch application experiencing heavy lock contention exhibits a high number of voluntary context switches. The cost of a voluntary context switch at a processor clock cycle level is an expensive operation (about 80,000 clock cycles).

A common practice to find contended locks in a Java application is periodically take thread dumps (jstack or KILL -3 ) and look for threads that tend to be blocked on the same lock across several thread dumps.

A number of graphical tools can help with this process:

samurai: http://samuraism.jp/samurai/
 ThreadLogic: https://java.net/projects/threadlogic
 IBM Thread and Monitor Analyzer: https://www.ibm.com/developerworks/com...
 Spotify Thread Dump Analyzer: https://github.com/spotify/threaddump-analyzer

Note that atomic and concurrent data structures rely on a CAS operation, which in general employs a form of synchronization. Even with concurrent or lock-free data structures, it’s not uncommon to see high contention around an atomic variable that leads to poor performance or scalability.