Ruby 3.0 JIT and beyond

TL;DR

Ruby 3.0 JIT is the fastest JIT ever for MRI. However, despite Ruby 3.0's big improvement in reducing i-cache misses, it's still not ready for optimizing Rails applications. Stay tuned for Ruby 3.1.

"Is Ruby 3 Actually Three Times Faster?"

Ruby 3 is indeed 3x faster than Ruby 2.0 in a Ruby 3x3 benchmark, Optcarrot:

https://www.ruby-lang.org/en/news/2020/12/25/ruby-3-0-0-released/

As proposed by Eregon, we measured performance after 3000 frames to measure peak time performance with Optcarrot.

Note that Ruby's JIT compiler methodology using a C compiler called "MJIT" was originally implemented by Vladimir Makarov. The JIT of Ruby 2.6 / 2.7 was slower than his implementation because we only partially merged his work and replaced the compiler to mitigate risks.

I've implemented lots of optimizations in Ruby 3, and finally it performs better than his original version. Ruby 3 JIT is the fastest JIT which MRI has ever had, which of course couldn't be achieved without his great work.

Measured at my local Linux x86-64 / GCC 9.3.0 environment

However, Optcarrot is just one of the two official Ruby 3 benchmarks. Please see Is Ruby 3 Actually Three Times Faster? for details, which was written by Noah Gibbs, the author of the other benchmark rails_ruby_bench.

Should I enable JIT on my Rails applications?

Not yet.

Let me quote what I wrote in the Ruby 3.0.0 release note:

As of Ruby 3.0, JIT is supposed to give performance improvements in limited workloads, such as games (Optcarrot), AI (Rubykon), or whatever application that spends majority of time in calling a few methods many times.

Although Ruby 3.0 significantly decreased a size of JIT-ed code, it is still not ready for optimizing workloads like Rails, which often spend time on so many methods and therefore suffer from i-cache misses exacerbated by JIT. Stay tuned for Ruby 3.1 for further improvements on this issue.

The JIT compiler generates native code and Ruby needs to run the JIT-ed code in addition to existing VM instructions. Apparently Rails uses more features than Optcarrot, e.g. Optcarrot doesn't trigger GC, and therefore a CPU's i-cache has little space for accommodating extra code.

As explained above, Ruby 3 decreased a generated code size for example from 1.3MB to 260KB in 100 methods of a Sinatra benchmark, which improved performance of the Sinatra benchmark, but still it was not enough.

Ruby 3.1 JIT

I still haven't given up to optimize web applications made by Rails or whatever frameworks by JIT, because that's where many people and I use Ruby for production.

I know some people are working on a lightweight JIT compiler project, such as MIR. Even if we introduce such a lightweight JIT as another tier of MRI JIT, for the best performance of long running applications like Rails, a heavyweight JIT should generate ideal native code for any method.

Given Ruby 3.0, I have some battle plans for the Ruby 3.1 JIT development.

Ractor-based JIT worker

The MJIT worker thread has been implemented using C, because Ruby has GVL for threads and we didn't want to block Ruby threads by what MJIT does.

Now we have Ractor. We can rewrite the MJIT worker implementation with Ractor to develop more complicated optimizations with Ruby. For example, if we annotate a method as side-effect free, we could call the method on compilation time to calculate the result beforehand, which we didn't want to do before because calling a Ruby method in an MJIT worker meant to acquire GVL.

On-Stack Replacement

On-Stack Replacement is a technique to replace code on stack. This is useful for eliminating various checks and branches in JIT-ed code because we can abandon too-optimized code outside the optimized code, for example when we redefine a method.

I know this is very hard to implement. It may be impossible as long as we use a C compiler for MJIT. Though I'm very convinced we need it for optimizing Ruby 3 further. For instance, because MRI has debug_inspector API which allows users to fetch any local variable in any frame from any method, MJIT currently doesn't optimize Ruby's local variables at all. If we can lazily move values from a native stack to Ruby's local variables when debug_inspector API is called, we can optimize them.

Reduce a code size further

Once we start using Ruby and have On-Stack Replacement, we should be able to reduce a JIT-ed code size further. I believe this effort will allow us to optimize a wider range of applications like Rails.

I'll also look for VM implementations that touch many cache lines and try to reduce their code size.

Sponsors

I can't thank my wife enough for supporting my development of Ruby. Ruby 3x3 couldn't be achieved without her help.

Also GitHub Sponsors, thank you for sponsoring my OSS development! Your support keeps me motivated even more.

https://github.com/sponsors/k0kubun

Written by

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store