Turning Looker Up to 11

Our experience upgrading from JRuby 9.1.17.0/OpenJDK8 to JRuby 9.2.13.0/OpenJDK11

Looker Engineering

Published in

Looker Engineering

9 min readNov 18, 2020

HTTP Response time pre/post the JRuby upgrade (DataDog)

By John W. Phillips and Kalen Peterson

Introduction

This article describes Looker’s experience upgrading JRuby and Java versions. It starts with a description of our software stack, then gives a general outline of the phases of the project and the challenges we hit at each point. The genesis of this upgrade project was building internal and customer pressure to support OpenJDK11, which requires JRuby 9.2.10+. Additionally, we suspected (rightly) that there were a lot of performance gains to be had both in the JRuby upgrade itself and in the JDK upgrade.

The Looker environment

Looker’s software stack is JRuby running in a puma/rack web server, using sequel for ORM, and lots of lower level code built in Java and Kotlin. We have a Jenkins CI/CD system which runs unit tests, builds Looker, and runs integration tests using Cucu and Docker. Diving into the JRuby side in more detail, we use rbenv to manage the JRuby versioning, bundler for Ruby package management, and warbler to build the Looker artifact, a jar file.

Looker’s software stack originally used JRuby 1.7, and in 2016 we moved to JRuby 9.1.17.0. After this upgrade project we are now running JRuby 9.2.13.0 with support for Oracle JDK8, OpenJDK8, and OpenJDK11.

The Upgrade Methodology

We predicted up-front that the change set required for the upgrade would be large, so one decision we made was to roll any backwards-compatible changes needed for the upgrade directly into the main code branch. Theoretically you might consider keeping everything in a branch, but practically many of the changes were code cleanup, package upgrades, and the like which arguably should go to the main branch anyway. Also, given that we believed the upgrade project would take a long time it seemed impractical to keep a large delta in a branch for the duration of the effort.

Another best practice when doing package upgrades is to land them into the main branch at the beginning of a release cycle. This gives some time to catch and sort out any issues before the changes are released and propagated out to customers.

Lastly, if you haven’t done so, add logging of the JRuby and Java version you’re using. When switching back and forth between versions it’s easy to lose track of what is running where.

The Ruby 2.3 -> 2.5 Upgrade

The first set of issues in the JRuby version upgrade was that in upgrading from JRuby 9.1.x to JRuby 9.2.x we were upgrading from Ruby version 2.3 to Ruby 2.5. This meant mostly converting all uses of Bignum and Fixnum in our codebase to Integer, as well as a few other changes. Also, most of the Ruby gems upon which we depend have minimum release versions where this conversion has been done, so we started to pick those up. Most of this work was straightforward. Of course, having a well built regression test system reduces the risk with these upgrades.

Even so, we encountered a handful of issues where these gem upgrades didn’t go smoothly. When we upgraded our mail gem to v2.7.0 to pick up the Ruby 2.5 updates it introduced a change in MIME type handling which was missed by our tests and caused an outage on our internal production instance. After backing out the change and debugging the issue we’re now on v2.6.6.

In any package/version upgrade project there will be secondary packages and functional areas related to the fact that you have a web application, and primary areas which are intricately connected to your product’s core value. In our case the sequel gem and database code paths are that primary area. We had been running a customized fork of Sequel v4.9 from ~2014, and after some discussion decided to target Sequel v4.49 but not move to v5.x. Moving to v4.49 brought in all of the Ruby 2.5 fixes, and there was only 1 Java 11 fix in the v5.x branch which we then back ported to our v4.49 fork.

JRuby

As the package upgrade work was tailing off we began doing test runs with different JRuby 9.2.x versions to sort out any initial issues. Technically this was very easy, just make a branch, change the version in the rbenv .ruby-version file, and run it through Jenkins.

The first layer of issues with changing JRuby versions were in 2 broad classes: JRuby bugs that we depended on which were fixed, and new JRuby issues or security improvements which required changes in our code. One interesting example of the former is a bug in the Ruby squiggly heredoc which got fixed somewhere in JRuby 9.2.x. The bug changed how indentation of multi-line strings works. We didn’t have many multiline strings in our production code, but we used them all over the place in our tests to inline YAML, and many of them became unparseable with different indentation. After some debug we found that MRI and JRuby 9.2.x agreed on squiggly heredoc indentation and JRuby 9.1.x was the outlier, so we concluded that it was a JRuby bug fix and rolled in changes to our tests which worked with both JRuby versions.

One case of a JRuby improvement which caused an issue is Mutex no longer being marshallable. The fact that we were marshaling into caches objects with Mutexes is arguably bad by design, and with JRuby 9.2.x it stopped working. Here we chose a shortcut and built a MarshallableMutex, which unlocks on serialization, rather than a deeper analysis and redesign of our data structures.

In total we had around 15 unique issues which we worked through running our unit test suite with the JRuby upgrade. After root cause analysis most could be fixed in the main branch in a way that worked with both JRuby versions.

Integration Testing

We began our JRuby 9.2.x testing with the latest release at that time, JRuby 9.2.11.1. Once we worked through most of the issues in our unit tests we began to bump up against known JRuby issues. With JRuby 9.2.11.1 we hit this issue. Once we realized what was blocking us we tried moving back to JRuby 9.2.8.0. In retrospect we could have saved some time by reading the JRuby release notes and Changelog, because next we ended up hitting a known issue with puma. Since the previous issue affected JRuby 9.2.10.0 onwards and the latter issue affected versions up to JRuby 9.2.8.0, by process of elimination we moved to JRuby 9.2.9.0. Success!

A Big Win

The JRuby 9.1.17.0 -> 9.2.9.0 upgrade was a big win. NewRelic metrics on our internal production instance showed GC latency down slightly and GC frequency ~60% lower. NewRelic also showed web transaction time ~50% lower, and DataDog showed Http response time ~80% lower. NewRelic data is collected with a rack shim, so it’s possible that the Datadog number reflected additional improvement in the puma layer outside of the NewRelic measurement universe. However, the one bitter outcome at this point was that we ended up on JRuby 9.2.9.0, and the first JRuby version to support OpenJDK11 is JRuby 9.2.10.0.

The JRuby developers had indicated that JRuby 9.2.11.1 was going to be the last JRuby release before JRuby 9.3.x, so internally at Looker we started bouncing around ideas for how to proceed. Members of our team had contributed to several JRuby projects previously, and a few had contacted the core JRuby development team in the past. Mostly due to the lack of other options, we decided to try to make a case for another JRuby release with a fix for the blocking issue.

GC Frequency pre/post the JRuby upgrade (NewRelic)

The Last Mile

When asking for a bug fix from a team maintaining open source software there are plenty of “don’ts”. Everyone is busy, so it’s important to frame what feedback you have as a technical argument rather than as a complaint.

Looker has a heavily multi-threaded JRuby application, and we were blocked from JRuby 9.2.11.1 and thus from OpenJDK11 by a JRuby multithreading bug. Part of JRuby’s core value proposition over Ruby is that it runs in the JVM and thus is a version of Ruby suitable for applications such as ours. Putting this together into a technical argument, we quickly found common ground with the JRuby development team when we explained it to them on the #jruby IRC channel. They agreed to do a JRuby 9.2.12.0 release with a fix for our issue.

In parallel with this effort we began to rethink how we were upgrading to new JRuby releases at Looker. Our basic methodology had always been to try moving forward to newer JRuby releases and see what the issues were. Thinking about where we were and what we had seen up to this point, it occurred to us that we had it backwards — we should be testing pre-release versions of JRuby and giving feedback so that we could use the release when it came out.

As the JRuby development team was working on the 9.2.12.0 release we began to experiment with how to plug pre-release versions of JRuby into our stack and run them through our Jenkins CI flow. Even though the JRuby team does publish nightly builds it turned out to be a difficult problem, because between rbenv, bundler, and warbler we could find no automated way to pick one up and use it.

Our second bitter setback came when the JRuby team released JRuby 9.2.12.0, our “Looker version”, and we hit a second multithreading bug. We returned to the JRuby development team with our results and they agreed to do JRuby 9.2.13.0 with a fix to our second issue.

However, by this point we had developed a flow where we could manually build our Looker warbler jar artifact and run it through our Jenkins Cucu-based integration test suite. Given our new technique to “flip the paradigm”, as JRuby 9.2.13.0 was being developed we could test in parallel. As the JRuby development team worked we did several iterations of building the JRuby 9.2.13.0 branch, building our Looker jar using the custom JRuby, and running the Cucu tests. At the end, success! We were able to move to JRuby 9.2.13.0.

OpenJDK11

We needed a few more code changes to support OpenJDK11, but they were straightforward. After changing how we instantiate Nashorn and fixing an issue with casting classloaders, Looker was running with OpenJDK11. When our internal production instance moved from JDK8 to OpenJDK11 we saw more performance improvements: cpu utilization down from ~25% to ~10%, http response time ~35% better, JVM heap usage down ~10%. During our interaction with the JRuby development team we learned that crediting these improvements entirely to OpenJDK11 is not quite fair, since JRuby itself has lots of JDK11-specific improvements which were enabled by the switch.

Host CPU utilization pre/post the OpenJDK8 -> OpenJDK11 upgrade (DataDog)

Next Steps

After this Java/JRuby upgrade project at Looker we have plenty of new ideas for how to make it smoother next time. We may build a Jenkins flow to build JRuby from source and run our CI flow using it, making it easier for us to move forward to new JRuby versions upon release. This could also be run with Java OpenJDK14 to set us up for that upgrade when we decide to take it on.

Conclusion

Even though it took over 9 months from start to finish, the JRuby/Java upgrade was a much needed modernization of the Looker software stack. Hopefully the example here and the project phases and issues we encountered will illuminate the work of anyone doing a project like this in the future.