Benchmark Driver Designed for Ruby 3x3

Mar 27, 2018 · 9 min read

My benchmark tool development was accepted as a Ruby Association Grant 2017 project and completed on March 21st. I’ll show you the details of the project.

benchmark_driver.gem's example usage that outputs a graph:

Image for post
Image for post

Project Summary

This project aims to improve benchmark_driver.gem, which is built to compare performance of different Ruby binaries easily and precisely. It also aims to increase the number of benchmark test cases to cover more Ruby core features, and make it easier to optimize Ruby 3x faster by preparing an environment to continuously benchmark the tests with the tool.

I also created some detailed milestones for it and achieved all of them during the project period.

What’s benchmark_driver.gem?

While the original benchmark/driver.rb hard-codes the loop count of while loop to be subtracted for some specific benchmark name prefixes, benchmark_driver.gem has the count as loop_count by design and thus you can adjust the loop count as you like. It can also automatically calculate the loop count which is expected to run for 3s (changeable by --run-duration option).

One great part of benchmark_driver.gem is a variety of built-in output formats and measurable metrics, and extensibility of them.
benchmark_driver.gem not only can measure execution time, i/s and memory, but also can integrate custom metrics of existing Ruby benchmark
like Optcarrot's fps. And you can output the results in a sexy comparison output similar to benchmark-ips, markdown or a graph image. Both metrics and outputs are fully pluggable.

Basic usage: Ruby interface

Let’s see the difference between an ordinary benchmark tool and benchmark_driver.gem.

This is a benchmark example with benchmark-ips.gem:

require 'benchmark/ips'class Array
alias_method :blank?, :empty?
Benchmark.ips do |x|
array = []'Array#empty?') { array.empty? }'Array#blank?') { array.blank? }!

With Ruby 2.5, It outputs a result like:

Warming up --------------------------------------
Array#empty? 524.659k i/100ms
Array#blank? 495.794k i/100ms
Calculating -------------------------------------
Array#empty? 15.497M (± 2.5%) i/s - 77.650M in 5.013969s
Array#blank? 14.171M (± 2.1%) i/s - 70.899M in 5.005282s
Array#empty?: 15497274.4 i/s
Array#blank?: 14171303.0 i/s - 1.09x slower

From the result, you may understand ActiveSupport’s `Array#blank?` is not so slow compared to Ruby’s built-in `Array#empty?`. Let’s try measuring the same thing in benchmark_driver.gem.

Here is the example of benchmark_driver.gem usage with Ruby interface:

require 'benchmark_driver'Benchmark.driver do |x|
x.prelude %{
class Array
alias_method :blank?, :empty?
array = []
} 'Array#empty?', %{ array.empty? } 'Array#blank?', %{ array.blank? }

When you run the above script with Ruby 2.5.0 (on Linux to see clocks/i), you will get an output like:

Warming up --------------------------------------
Array#empty? 56.340M i/s
Array#blank? 42.795M i/s
Calculating -------------------------------------
Array#empty? 181.135M i/s - 169.019M times in 0.933111s (5.52ns/i, 23clocks/i)
Array#blank? 99.275M i/s - 128.386M times in 1.293235s (10.07ns/i, 44clocks/i)
Array#empty?: 181134991.8 i/s
Array#blank?: 99275008.9 i/s - 1.82x slower

So this result shows `Array#empty?` is actually about 1.8x faster than `Array#blank?`. The result is reasonable because Ruby’s optimized instruction for `Array#empty?` (opt_empty_p) is applied only if method name is "empty?".

Why does this result difference happen? There are 2 reasons:

  • Overhead of calling a block has large overhead, compared to just running a part of while loop. benchmark_driver.gem takes a benchmark definition as string to dynamically generate such a loop script, instead of taking a script as a block.
  • As I said before, it subtracts while loop overhead.

You may claim that benchmark-ips.gem also takes a string instead of a block for a measured script, which actually is exactly the same interface as benchmark_driver.gem’s ``, but it doesn’t have `x.prelude` and so you can’t have a predefined local variable. I assume that it’s not designed for such usage.

Like `Array#empty?`, there are some methods which can run even faster than the overhead of calling a block. As such methods are optimized because they are frequently used, it’s important to measure performance of such methods accurately.

Comparing multiple Ruby binaries

Here is the example to compare the performance between multiple Ruby implementations. In `x.rbenv`, you can specify Ruby binaries managed by rbenv as "[shown name]::[rbenv name],[arg1],[arg2]…".

require 'benchmark_driver'Benchmark.driver do |x|
x.prelude %{
def script
i = 0
while i < 1000_000
i += 1
} 'while', %{ script }
x.loop_count 2000

And here is the output:

2.0.0: ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-linux]
2.5.0: ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux]
2.6.0-dev: ruby 2.6.0dev (2018-03-21 trunk 62870) [x86_64-linux]
2.6.0-dev+JIT: ruby 2.6.0dev (2018-03-21 trunk 62870) +JIT [x86_64-linux]
Calculating -------------------------------------
2.0.0 2.5.0 2.6.0-dev 2.6.0-dev+JIT
while 77.952 80.325 87.239 491.907 i/s - 2.000k times in 25.656691s 24.898879s 22.925498s 4.065807s
2.6.0-dev+JIT: 491.9 i/s
2.6.0-dev: 87.2 i/s - 5.64x slower
2.5.0: 80.3 i/s - 6.12x slower
2.0.0: 78.0 i/s - 6.31x slower

Obviously this tool is convenient to know: what times is the current Ruby faster than Ruby 2.0? Please try comparing performance with JRuby, Rubinius and truffleruby.

Not only plain text output like this, but also you can visualize this as a graph using a plugin (explained later).

Advanced usage: YAML and CLI

$ benchmark-driver -h
Usage: benchmark-driver [options] [YAML]
-r, --runner [TYPE]
-o, --output [TYPE]
-e, --executables [EXECS]
--rbenv [VERSIONS]
--repeat-count [NUM]
--filter [REGEXP]
--verbose [LEVEL]
--run-duration [SECONDS]

It takes YAMLs as its arguments.

Here is an example of a benchmark definition in YAML.

prelude: |
large_a = "Hellooooooooooooooooooooooooooooooooooooooooooooooooooo"
large_b = "Wooooooooooooooooooooooooooooooooooooooooooooooooooorld"
small_a = "Hello"
small_b = "World"
large: '"#{large_a}, #{large_b}!"'
small: '"#{small_a}, #{small_b}!"'

If you save it as “benchmark.yml”, you can run it like:

$ benchmark-driver benchmark.yml --rbenv '2.4.3;2.5.0'
Warming up --------------------------------------
large 3.693M i/s
small 9.913M i/s
Calculating -------------------------------------
2.4.3 2.5.0
large 3.895M 5.485M i/s - 11.079M times in 2.844249s 2.019943s
small 11.755M 11.103M i/s - 29.740M times in 2.529966s 2.678612s
2.5.0: 5484705.8 i/s
2.4.3: 3895155.8 i/s - 1.41x slower
2.4.3: 11755264.9 i/s
2.5.0: 11102923.0 i/s - 1.06x slower

This shows characteristics of Ruby 2.5’s string interpolation performance improvement.

In YAML output, you can simplify the definition by its format, and you can specify Ruby executable without modifying code. Please use either of them depending on your use case.

Plugin System

The plugin interface has evolved dramatically and it’s not so stable yet,
which is the only reason I don’t release it as 1.x, but you'll be able to continue using the same interface unless you're a plugin developer.

To know what plugin is available, refer to benchmark_driver's README:

You may be interested in a graph output, markdown output, memory runner or the "command_stdout" runner to integrate any existing Ruby benchmarks without building a runner plugin.

The goals of this project included that “preparing an environment to continuously benchmark the tests with the tool”. I wish I could use or its temporary fork to use benchmark_driver.gem, but a server which I could borrow for benchmark can’t be reached from the Internet.

To make sure my goal is achieved in a short period using the server, instead of contributing to RubyBench, I decided to create a system that can generate a static page to be published on GitHub pages. As we can easily create a plugin
to integrate benchmark_driver.gem with any services, we’ll be able to merge my works to RubyBench project later. After that, might be obsoleted. The major difference between RubyBench and the site for now is that it includes JIT results.

If you are interested in how it works, you may want to see:

In this section, I’ll mainly describe about the benchmark sets which are included in the site.

Ruby Core

They are benchmarks in Ruby repository, but they are converted to YAML format to abstract away the while loop which I mentioned above.

While the benchmark results are also avalable on RubyBench, you can see JIT-ed results on But note that current Ruby’s JIT is method JIT and many of those benchmarks don’t create a method to be JIT-ed. In such a situation, JIT results in these benchmark might be useless and you may want to see “MJIT benchmarks” instead.

Ruby Method

This is originally created by @Watson1978 at He has actively improved Ruby’s performance by measuring many Ruby core features and find parts to work on, using the benchmark set.

Having a good coverage of benchmarked features is good to catch the performance regression. You may be able to find Ruby core features to be improved in the results.


The benchmark set is created by Vladimir Makarov. As I said above, a benchmark hotspot needs to be a method to measure method JIT’s performance. At the same time, we want to omit the overhead of method call as much as possible. The benchmark set seems to be designed for measuing method JIT performance correctly.

After benchmark is finished for recent revisions, you’ll be able to see latest JIT performance improvements.


It’s created by @mame and the repository adds only benchmark.yml to it.
You may want to see for the details of the amazing program for benchmark.

Future works

Sophisticating plugin interface

module BenchmarkDriver
Metrics =
:value, # @param [Float]
:executable, # @param [BenchmarkDriver::Config::Executable]
:duration, # @param [Float,nil]
Metrics::Type =
:unit, # @param [String]
:larger_better, # @param [TrueClass,FalseClass]
:worse_word, # @param [String]
defaults: { larger_better: true, worse_word: 'slower' },

This was good enough to express all of execution time, memory consumption as max resident set and optcarrot fps, and to make a comparison or a graph from it.

But what if want to show real time and user/system CPU time at the same time? What about showing 50/75/90/99 percentile in one graph?

To achieve such use case, I’m planning to change only one Float value to a Hash as { Symbol => Float }. It should have a Symbol that should express the main metrics, and I’ll define some well-known Symbols to achieve special outputs available for limited plugins.

Add more benchmarks to

Anyway, I’ll show some future canditates to be added to


Discourse is a popular Rails application and has script/bench.rb to measure its performance. And Noah Gibbs created its improved version as rails_ruby_bench.

Since one of major Ruby use cases is Rails, we definitely need to measure Ruby performance with Rails. This is blocked by “showing 50/75/90/99 percentile” issue and a disk consumption problem for now. Rails has too many dependencies compared to Optcarrot…


Fluentd is a log collector written in Ruby, which is used by so many large-scale services Fluentd might be a good real-world Ruby use case whose performance is very important.
Disclaimer: Note that I’m an employee of Treasure Data, a company building Fluentd.

I modified one_forward benchmark in so that we can use it from benchmark_driver.gem. Currently it’s not on because the benchmark result does not look affected by Ruby’s performance. I’m not sure why, and we need to investigate the cause to help Fluentd become faster.

Integrating derailed_benchmarks.gem

Recently @schneems kindly gave me a commit bit of the gem. So I’ll be able to improve the situation from both sides.



Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch

Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore

Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store