Ruby 2.6 JIT - Progress and Future
In early 2018, Ruby core team merged JIT compiler infrastructure called “MJIT”, which is created by Vladimir Makarov and uses C compiler to generate native code and loads a shared object file dynamically, along with my JIT compiler that is implemented on MJIT and Ruby 2.5-compatible Virtual Machine.
Fortunately my effort has not been reverted, so it will be officially released next week!
Over 10 months, Ruby’s JIT has evolved for the following focuses:
Let me show you what we’ve achieved for each of them in Ruby 2.6.
At least for Ruby, JIT is introduced to improve performance, specifically for speed in my understanding. So, if it doesn’t make your application fast, it would be useless for you, right? Given the whole year, I believed I would be able to achieve small but meaningful performance improvement on real-world applications in 2.6. However, I failed to make it.
In an early stage of 2.6, some micro benchmarks have had the good progress.
trunk+JIT: 493.8 i/s
2.6.0-preview1+JIT: 246.4 i/s - 2.00x slower
2.6.0-preview1: 86.8 i/s - 5.69x slower
trunk: 86.2 i/s - 5.73x slower
2.5.0: 80.9 i/s - 6.11x slower
2.0.0: 78.4 i/s - 6.30x slower
2.6.0-preview3+JIT: 86.6 fps
2.6.0-preview2+JIT: 73.9 fps - 1.17x slower
2.6.0-preview1+JIT: 59.2 fps - 1.46x slower
2.6.0-preview3: 54.6 fps - 1.59x slower
2.6.0-preview2: 53.3 fps - 1.62x slower
2.6.0-preview1: 53.0 fps - 1.63x slower
2.5.3: 48.5 fps - 1.78x slower
2.0.0: 34.6 fps - 2.50x slower
This looks very great, and this is gonna be released as is in Ruby 2.6. It doesn't look so bad.
Guess what? It turned out that the NES emulator Optcarrot does not reflect a typical real-world workload in Ruby like Rails application so much. In Optcarrot, 2~3 methods which can be JIT-ed in a very short time are called very frequently, and the most busy method is large enough to be optimized well with MJIT's current optimization strategy.
In a typical Rails application, there's no single busy CPU-intensive method that can be optimized well by current MJIT. Noah Gibbs has written articles about MJIT's performance on Rails, and the latest article showed the current JIT does not make Rails fast.
After preview3, I called for Ruby benchmarks that are made slow by JIT:
I collected them to experiment an easy fix that can be done within 1 month prior to the final 2.6 release, but I seem to have underestimated the difficulty of it, time flies, and I suddenly realized that this can't be fixed by a small change that can be rushed after 2.6.0-rcX releases.
Therefore, none of them hasn't been improved yet and apparently JIT on Ruby 2.6 failed to be ready for production. The only improvements we made in rc1 and rc2 were MJIT support in forked child processes and a bug fix for JIT on bootsnap.
So, it succeeded to be an interesting toy to experiment possible great future, but I failed to provide meaningful performance improvements as of Ruby 2.6. (Skip to the last "Future" section if you're interested in only performance.)
As written in the release note of Ruby 2.6-rc2, it aims to support JIT when it's built with following C compilers on non-EOL platforms.
- Microsoft Visual C++
When MJIT infrastructure is published by Vladimir Makarov, it had support for only GCC or Clang running on Unix environments, and it actually had some issues on non-Linux platforms like macOS or Windows MinGW.
With a lot of investment on debugging Windows environments, I'm pleased to say that MJIT infrastructure is working fine on MinGW and Microsoft Visual C++. Both of them had its own difficult challenges. For example, we're relying on pre-processor-only mode of GCC and Clang but VC++ doesn't have that, then I ended up creating completely different build system. Who cares?
Helped by Nobuyoshi Nakada and Usa Nakamura, now MJIT is running for the above compilers at least on platforms listed in RubyCI.
Another good news is that now Heroku supports MJIT. Thank you so much for the effort on it!
We've invested testing environment of MJIT as well. To prevent bugs, we have:
ci.rvm.jp is a continuous testing environment maintained by Sasada Koichi. It's running for 24 hours even while any commit is not made. It has been revealed bugs which rarely occur.
On that, we have 3 types of MJIT-related builds. trunk-mjit-wait is for running all tests with "--jit-wait", which blocks when a method is queued to be JIT-ed and synchronously waits for JIT compilation, so that it exposes VM or JIT compiler-related bugs. trunk-mjit is running with "--jit" because synchronous compilation can't test SEGV or deadlocks on race condition. trunk-no-mjit is for testing configure flag --disable-mjit-support.
We also configured Wercker CI, which is dedicated for MJIT testing at this repository. It's running both --jit and --jit-wait testing. It's integrated with GitHub, and I think it's convenient for those who mainly watch GitHub for Ruby core development, including me.
These CI environments have reported many bugs which were not easy to fix. Have you ever considered what if Ruby script waits all child processes including MJIT’s C compiler process? Can you imagine what could happen on multi-threaded environment without GVL (for MJIT worker thread) that forks and execs on each thread and needs to deal with asynchronous signal handler? I can't count how many deadlocks, SEGVs, and esoteric test failures we've fixed. But now it's stable enough at least on the above CI environments, with a large help from Eric Wong.
We created the following continuous benchmark environments as well so that we can prevent unexpected performance regression:
RubyBench has existed for a long time. This year original RubyBench developers added a support for measuring --jit, and I joined the team and maintained the --jit testing and benchmark driver part.
benchmark-driver.github.io is originally created for testing my benchmark tool benchmark_driver.gem at RubyGrant 2017, but it now exists for covering more test cases along with RubyBench. Benchmarking is hard because we should use the same machine without any parallelism forever and increasing the number of test cases in one benchmark system could require a lot of time for measuring one commit.
I succeeded to catch some unexpected performance regressions of MJIT at https://benchmark-driver.github.io/benchmarks/optcarrot/commits.html, and sometimes it contributed to expose random failure on a specific benchmark program.
Ruby’s JIT compiler translates Ruby method to C code on /tmp, invokes C compiler with it, and dynamically loads the generated object file. Doesn’t it sound like dangerous?
We can’t make everything safe. For example, if attackers already have access to root user, they could do almost anything. So we want the JIT to be somewhat reasonably secure.
As a user, I would like to be able to prevent at least the following situations:
- C compiler is replaced to an arbitrary binary
- C header files or libraries are replaced
- Another user’s program on the same server overwrites C code to be JIT-ed
For 1, Ruby binary remembers the full path of C compiler which is used to build Ruby interpreter, and Ruby’s JIT uses it on runtime. If C compiler is securely installed, unless an attacker has access to replace the original C compiler or somehow can introduce unexpected chroot on start, it would be hard to spoil it.
For 2, Ruby’s JIT uses only one header file in Ruby’s install directory and just links to usual libraries like ones used for normal C extensions. An attacker may need to access to Ruby’s install directory or C libraries to break it.
For 3, we create C source file with O_EXCL|O_CREAT to prevent any other program from changing the permission by creating a file with the same name beforehand. This part has been fine since I wrote my previous article.
Actually these parts were mostly improved by Nobuyoshi Nakada after the initial merge. Kudos to @nobu for a lot of improvements on build system!
While I failed to make Ruby's JIT production-ready in 2.6, I don't regret what we've done this year. I needed to develop or lead all of the above things for Ruby 2.6, but I believe I can invest my time mainly on performance improvement in Ruby 2.7.
I assume the main reason of slow down on real-world applications is that the overhead of loading JIT-ed code and increasing the times to invoke new Virtual Machine over JIT-ed frame becomes larger than what we can optimize now.
I'm thinking about taking following 2 strategies to approach this:
- Inline Ruby methods as further as possible to avoid loading code or creating a VM frame many times
- Optimize object allocations which we've not invested so much in JIT so far
The former is relatively easy for me because I experimented method inlining and have a patch for 1-level inlining. I just need to change it to support multi-level inlining. I'm also planning to take time to learn other JIT implementation's strategy to decide what kind of unit to compile.
The latter might require to implement escape analysis for allocating objects on stack, which is known to be difficult to implement. But I'm sure that there's a room to improve in it and real-world applications allocate more objects than Optcarrot.
Also there may be an ongoing experiment for MRI JIT by another person and it might be published for Ruby 2.7 or 3.0.
I hope that we'll be able to say "MJIT makes Rails X times faster" in near future, until we hit version 3.0.