Move Cassandra 2.1 to G1 garbage collector

Michał Łowicki
3 min readFeb 13, 2016

Since the very beginning of our journey with C* at Opera we had proper monitoring of the part which had enormous impact on how the database behaves — garbage collector.

It worked relatively well for long time but sporadically some node was hit by long GC pause (up to 10–20 seconds).

We’ve tried to tune GC-related settings. It’s a long and painful process and still you see internal mechanics as a black box.

For some time on mailing list and IRC channel users started to share their successful stories of moving to G1. People confirmed this method seems ready to be used on production so I got it a try. The most promising part was it’s no longer needed to set sizes of different parts of the heap which has been changed many times and always required lots of tests. New method needs much more heap size but it wasn’t a problem for us to use 10GB or 16GB more memory.

What is actually needed to switch to G1?

Modify cassandra-env.sh

  1. Remove calculate_heap_sizes() function
  2. Set new size of the heap (we’ve started with 16GB but moved finally to 24GB):
-MAX_HEAP_SIZE="8192M"
-HEAP_NEWSIZE="2048M"
+MAX_HEAP_SIZE="16384M"

3. Get rid of code for checking MAX_HEAP_SIZE and removed HEAP_NEWSIZE variable:

-if [ "x$MAX_HEAP_SIZE" = "x" ] && [ "x$HEAP_NEWSIZE" = "x" ]; then
- calculate_heap_sizes
-else
- if [ "x$MAX_HEAP_SIZE" = "x" ] || [ "x$HEAP_NEWSIZE" = "x" ]; then
- echo "please set or unset MAX_HEAP_SIZE and HEAP_NEWSIZE in pairs (see cassandra-env.sh)"
- exit 1
- fi
-fi

4. Stop passing Xmn option to JVM:

-JVM_OPTS="$JVM_OPTS -Xmn${HEAP_NEWSIZE}"

5. Finally set new GC options:

-JVM_OPTS="$JVM_OPTS -XX:+UseParNewGC"
-JVM_OPTS="$JVM_OPTS -XX:+UseConcMarkSweepGC"
-JVM_OPTS="$JVM_OPTS -XX:+CMSParallelRemarkEnabled"
-JVM_OPTS="$JVM_OPTS -XX:SurvivorRatio=8"
-JVM_OPTS="$JVM_OPTS -XX:MaxTenuringThreshold=1"
-JVM_OPTS="$JVM_OPTS -XX:CMSInitiatingOccupancyFraction=75"
-JVM_OPTS="$JVM_OPTS -XX:+UseCMSInitiatingOccupancyOnly"
JVM_OPTS="$JVM_OPTS -XX:+UseTLAB"
JVM_OPTS="$JVM_OPTS -XX:CompileCommandFile=$CASSANDRA_CONF/hotspot_compiler"
-JVM_OPTS="$JVM_OPTS -XX:CMSWaitDuration=10000"
-# note: bash evals ‘1.7.x’ as > ‘1.7’ so this is really a >= 1.7 jvm check
-if { [ "$JVM_VERSION" \> "1.7" ] && [ "$JVM_VERSION" \< "1.8.0" ] && [ "$JVM_PATCH_VERSION" -ge "60" ]; } || [ "$JVM_VERSION" \> "1.8" ] ; then
- JVM_OPTS="$JVM_OPTS -XX:+CMSParallelInitialMarkEnabled -XX:+CMSEdenChunksRecordAlways -XX:CMSWaitDuration=10000"
-fi
+JVM_OPTS="$JVM_OPTS -XX:+UseG1GC"
+JVM_OPTS="$JVM_OPTS -XX:MaxGCPauseMillis=500"
+JVM_OPTS="$JVM_OPTS -XX:G1RSetUpdatingPauseTimePercent=5"

Adjust Logstash

We’re using Logstash for parsing GC logs and display it later using Kibana. Format of logs changed after moving to G1 so we needed to fix it:

-GCTYPE (GC)|(Full GC)
-GCREASON [^)]+
-JVMGCLOG (%{TIMESTAMP_ISO8601:logdate}: )?%{FLOAT}: (#%{INT}: )?\[%{GCTYPE:gc_type} (\(%{GCREASON:gc_reason}\) )?%{INT:gc_memory_before:int}K->%{INT:gc_memory_after:int}K\(%{INT}K\), %{FLOAT:gc_duration:float} secs\]
+GC_TAG [^)]+
+GC_TYPE [a-z-]+
+GC_MEMORY_UNIT K|M|G
+JVMGCLOG %{TIMESTAMP_ISO8601:logdate}: %{FLOAT}: \[GC %{GC_TYPE}( \(%{GC_TAG}\))*( %{INT:gc_memory_before:int}%{GC_MEMORY_UNIT:gc_memory_before_unit}->%{INT:gc_memory_after:int}%{GC_MEMORY_UNIT:gc_memory_after_unit}\(%{INT}%{GC_MEMORY_UNIT}\))?, %{FLOAT:gc_duration:float} secs\]

Re-launched Cassandra will use G \0/

Along the way I’ve found two more options worth to change in cassandra-env.sh.

+JVM_OPTS="$JVM_OPTS -XX:+AlwaysPreTouch"
+JVM_OPTS="$JVM_OPTS -XX:-UseBiasedLocking"

This way we’re also more consistent with what is already done on 3.x branch.

It looks that G1 better fits for our work load and we don’t see multi-seconds pauses any more.

Right now sum of pauses oscillates around 600–1000 ms per 1 minute on single host (with 24GB heap). Pauses are retrieved from GC logs after enabling PrintGCApplicationStoppedTime in cassandra-env.sh:

+JVM_OPTS="$JVM_OPTS -XX:+PrintGCApplicationStoppedTime"

--

--

Michał Łowicki

Software engineer at Datadog, previously at Facebook and Opera, never satisfied.