Benchmark Driver Designed for Ruby 3x3

My benchmark tool development was accepted as a Ruby Association Grant 2017 project and completed on March 21st. I’ll show you the details of the project.

benchmark_driver.gem's example usage that outputs a graph:

Project Summary

Here is a copy of the project summary which I sent to Ruby Association:

This project aims to improve benchmark_driver.gem, which is built to compare performance of different Ruby binaries easily and precisely. It also aims to increase the number of benchmark test cases to cover more Ruby core features, and make it easier to optimize Ruby 3x faster by preparing an environment to continuously benchmark the tests with the tool.

I also created some detailed milestones for it and achieved all of them during the project period.

What’s benchmark_driver.gem?

benchmark_driver.gem is originally designed as a successor of benchmark/driver.rb in Ruby language repository, which can measure a Ruby script performance accurately by subtracting time taken for while loop from a looped script to measure, and which can benchmark multiple Ruby binaries at the same time and compare them.

While the original benchmark/driver.rb hard-codes the loop count of while loop to be subtracted for some specific benchmark name prefixes, benchmark_driver.gem has the count as loop_count by design and thus you can adjust the loop count as you like. It can also automatically calculate the loop count which is expected to run for 3s (changeable by --run-duration option).

One great part of benchmark_driver.gem is a variety of built-in output formats and measurable metrics, and extensibility of them.
benchmark_driver.gem not only can measure execution time, i/s and memory, but also can integrate custom metrics of existing Ruby benchmark
like Optcarrot's fps. And you can output the results in a sexy comparison output similar to benchmark-ips, markdown or a graph image. Both metrics and outputs are fully pluggable.

Basic usage: Ruby interface

benchmark_driver.gem has 2 benchmark definition formats: Ruby and YAML.
Ruby interface is designed for a usual Ruby benchmark or a use case to generate a benchmark target script dynamically, while YAML format is good for minimizing effort to write benchmark definitions and for flexibly changing parameters on the fly without modifying code.

Let’s see the difference between an ordinary benchmark tool and benchmark_driver.gem.

This is a benchmark example with benchmark-ips.gem:

require 'benchmark/ips'
class Array
alias_method :blank?, :empty?
end
Benchmark.ips do |x|
array = []
x.report('Array#empty?') { array.empty? }
x.report('Array#blank?') { array.blank? }
x.compare!
end

With Ruby 2.5, It outputs a result like:

Warming up --------------------------------------
Array#empty? 524.659k i/100ms
Array#blank? 495.794k i/100ms
Calculating -------------------------------------
Array#empty? 15.497M (± 2.5%) i/s - 77.650M in 5.013969s
Array#blank? 14.171M (± 2.1%) i/s - 70.899M in 5.005282s
Comparison:
Array#empty?: 15497274.4 i/s
Array#blank?: 14171303.0 i/s - 1.09x slower

From the result, you may understand ActiveSupport’s `Array#blank?` is not so slow compared to Ruby’s built-in `Array#empty?`. Let’s try measuring the same thing in benchmark_driver.gem.

Here is the example of benchmark_driver.gem usage with Ruby interface:

require 'benchmark_driver'
Benchmark.driver do |x|
x.prelude %{
class Array
alias_method :blank?, :empty?
end
array = []
}
x.report 'Array#empty?', %{ array.empty? }
x.report 'Array#blank?', %{ array.blank? }
end

When you run the above script with Ruby 2.5.0 (on Linux to see clocks/i), you will get an output like:

Warming up --------------------------------------
Array#empty? 56.340M i/s
Array#blank? 42.795M i/s
Calculating -------------------------------------
Array#empty? 181.135M i/s - 169.019M times in 0.933111s (5.52ns/i, 23clocks/i)
Array#blank? 99.275M i/s - 128.386M times in 1.293235s (10.07ns/i, 44clocks/i)
Comparison:
Array#empty?: 181134991.8 i/s
Array#blank?: 99275008.9 i/s - 1.82x slower

So this result shows `Array#empty?` is actually about 1.8x faster than `Array#blank?`. The result is reasonable because Ruby’s optimized instruction for `Array#empty?` (opt_empty_p) is applied only if method name is "empty?".

Why does this result difference happen? There are 2 reasons:

  • Overhead of calling a block has large overhead, compared to just running a part of while loop. benchmark_driver.gem takes a benchmark definition as string to dynamically generate such a loop script, instead of taking a script as a block.
  • As I said before, it subtracts while loop overhead.

You may claim that benchmark-ips.gem also takes a string instead of a block for a measured script, which actually is exactly the same interface as benchmark_driver.gem’s `x.report`, but it doesn’t have `x.prelude` and so you can’t have a predefined local variable. I assume that it’s not designed for such usage.

Like `Array#empty?`, there are some methods which can run even faster than the overhead of calling a block. As such methods are optimized because they are frequently used, it’s important to measure performance of such methods accurately.

Comparing multiple Ruby binaries

Another reason why benchmark_driver.gem is good for measuring Ruby’s performance is that it can run any Ruby binaries at the same time and compare the benchmark results.

Here is the example to compare the performance between multiple Ruby implementations. In `x.rbenv`, you can specify Ruby binaries managed by rbenv as "[shown name]::[rbenv name],[arg1],[arg2]…".

require 'benchmark_driver'
Benchmark.driver do |x|
x.prelude %{
def script
i = 0
while i < 1000_000
i += 1
end
i
end
}
x.report 'while', %{ script }
x.loop_count 2000
  x.rbenv(
'2.0.0::2.0.0-p0',
'2.5.0',
'2.6.0-dev',
'2.6.0-dev+JIT::2.6.0-dev,--jit',
)
x.verbose
end

And here is the output:

2.0.0: ruby 2.0.0p0 (2013-02-24 revision 39474) [x86_64-linux]
2.5.0: ruby 2.5.0p0 (2017-12-25 revision 61468) [x86_64-linux]
2.6.0-dev: ruby 2.6.0dev (2018-03-21 trunk 62870) [x86_64-linux]
2.6.0-dev+JIT: ruby 2.6.0dev (2018-03-21 trunk 62870) +JIT [x86_64-linux]
Calculating -------------------------------------
2.0.0 2.5.0 2.6.0-dev 2.6.0-dev+JIT
while 77.952 80.325 87.239 491.907 i/s - 2.000k times in 25.656691s 24.898879s 22.925498s 4.065807s
Comparison:
while
2.6.0-dev+JIT: 491.9 i/s
2.6.0-dev: 87.2 i/s - 5.64x slower
2.5.0: 80.3 i/s - 6.12x slower
2.0.0: 78.0 i/s - 6.31x slower

Obviously this tool is convenient to know: what times is the current Ruby faster than Ruby 2.0? Please try comparing performance with JRuby, Rubinius and truffleruby.

Not only plain text output like this, but also you can visualize this as a graph using a plugin (explained later).

Advanced usage: YAML and CLI

benchmark_driver.gem has CLI like this (descriptions are omitted because the width of medium is too small...):

$ benchmark-driver -h
Usage: benchmark-driver [options] [YAML]
-r, --runner [TYPE]
-o, --output [TYPE]
-e, --executables [EXECS]
--rbenv [VERSIONS]
--repeat-count [NUM]
--bundler
--filter [REGEXP]
--verbose [LEVEL]
--run-duration [SECONDS]

It takes YAMLs as its arguments.

Here is an example of a benchmark definition in YAML.

prelude: |
large_a = "Hellooooooooooooooooooooooooooooooooooooooooooooooooooo"
large_b = "Wooooooooooooooooooooooooooooooooooooooooooooooooooorld"
small_a = "Hello"
small_b = "World"
benchmark:
large: '"#{large_a}, #{large_b}!"'
small: '"#{small_a}, #{small_b}!"'

If you save it as “benchmark.yml”, you can run it like:

$ benchmark-driver benchmark.yml --rbenv '2.4.3;2.5.0'
Warming up --------------------------------------
large 3.693M i/s
small 9.913M i/s
Calculating -------------------------------------
2.4.3 2.5.0
large 3.895M 5.485M i/s - 11.079M times in 2.844249s 2.019943s
small 11.755M 11.103M i/s - 29.740M times in 2.529966s 2.678612s
Comparison:
large
2.5.0: 5484705.8 i/s
2.4.3: 3895155.8 i/s - 1.41x slower
small
2.4.3: 11755264.9 i/s
2.5.0: 11102923.0 i/s - 1.06x slower

This shows characteristics of Ruby 2.5’s string interpolation performance improvement.

In YAML output, you can simplify the definition by its format, and you can specify Ruby executable without modifying code. Please use either of them depending on your use case.

Plugin System

You can customize metrics to be measured and a format for output as you like,
and build your own versions if you want.

The plugin interface has evolved dramatically and it’s not so stable yet,
which is the only reason I don’t release it as 1.x, but you'll be able to continue using the same interface unless you're a plugin developer.

To know what plugin is available, refer to benchmark_driver's README: https://github.com/k0kubun/benchmark_driver#output-options

You may be interested in a graph output, markdown output, memory runner or the "command_stdout" runner to integrate any existing Ruby benchmarks without building a runner plugin.

benchmark-driver.github.io

The goals of this project included that “preparing an environment to continuously benchmark the tests with the tool”. I wish I could use https://rubybench.org or its temporary fork to use benchmark_driver.gem, but a server which I could borrow for benchmark can’t be reached from the Internet.

To make sure my goal is achieved in a short period using the server, instead of contributing to RubyBench, I decided to create a system that can generate a static page to be published on GitHub pages. As we can easily create a plugin
to integrate benchmark_driver.gem with any services, we’ll be able to merge my works to RubyBench project later. After that, benchmark-driver.github.io might be obsoleted. The major difference between RubyBench and the site for now is that it includes JIT results.

If you are interested in how it works, you may want to see:
https://github.com/benchmark-driver/skybench
https://github.com/benchmark-driver/benchmark-driver.github.io

In this section, I’ll mainly describe about the benchmark sets which are included in the site.

Ruby Core

They are benchmarks in Ruby repository, but they are converted to YAML format to abstract away the while loop which I mentioned above.

While the benchmark results are also avalable on RubyBench, you can see JIT-ed results on benchmark-driver.github.io. But note that current Ruby’s JIT is method JIT and many of those benchmarks don’t create a method to be JIT-ed. In such a situation, JIT results in these benchmark might be useless and you may want to see “MJIT benchmarks” instead.

Ruby Method

This is originally created by @Watson1978 at https://github.com/Watson1978/ruby-method-benchmarks. He has actively improved Ruby’s performance by measuring many Ruby core features and find parts to work on, using the benchmark set.

Having a good coverage of benchmarked features is good to catch the performance regression. You may be able to find Ruby core features to be improved in the results.

MJIT

The benchmark set is created by Vladimir Makarov. As I said above, a benchmark hotspot needs to be a method to measure method JIT’s performance. At the same time, we want to omit the overhead of method call as much as possible. The benchmark set seems to be designed for measuing method JIT performance correctly.

After benchmark is finished for recent revisions, you’ll be able to see latest JIT performance improvements.

Optcarrot

It’s created by @mame and the repository adds only benchmark.yml to it.
You may want to see https://www.slideshare.net/mametter/optcarrot-a-pureruby-nes-emulator for the details of the amazing program for benchmark.

Future works

Outside the scope which was originally proposed for this project, we have several things to work on to make the best use of the outcome.

Sophisticating plugin interface

Right now the interface between Runner and Output is like this:

module BenchmarkDriver
Metrics = ::BenchmarkDriver::Struct.new(
:value, # @param [Float]
:executable, # @param [BenchmarkDriver::Config::Executable]
:duration, # @param [Float,nil]
)
  Metrics::Type = ::BenchmarkDriver::Struct.new(
:unit, # @param [String]
:larger_better, # @param [TrueClass,FalseClass]
:worse_word, # @param [String]
defaults: { larger_better: true, worse_word: 'slower' },
)
end

This was good enough to express all of execution time, memory consumption as max resident set and optcarrot fps, and to make a comparison or a graph from it.

But what if want to show real time and user/system CPU time at the same time? What about showing 50/75/90/99 percentile in one graph?

To achieve such use case, I’m planning to change only one Float value to a Hash as { Symbol => Float }. It should have a Symbol that should express the main metrics, and I’ll define some well-known Symbols to achieve special outputs available for limited plugins.

Add more benchmarks to benchmark-driver.github.io

Only measuring micro benchmarks and NES emulator performance sounds not enough to achieve Ruby 3x3. But adding benchmarks is a little challenging because it’s already taking a lot of time to finish running all benchmarks for one revision. Also, installing many gems for each revision will trigger a disk consumption problem if we keep many Ruby revisions for future benchmark addition.

Anyway, I’ll show some future canditates to be added to benchmark-driver.github.io.

Discourse

https://github.com/noahgibbs/rails_ruby_bench
https://github.com/discourse/discourse

Discourse is a popular Rails application and has script/bench.rb to measure its performance. And Noah Gibbs created its improved version as rails_ruby_bench.

Since one of major Ruby use cases is Rails, we definitely need to measure Ruby performance with Rails. This is blocked by “showing 50/75/90/99 percentile” issue and a disk consumption problem for now. Rails has too many dependencies compared to Optcarrot…

Fluentd

https://github.com/benchmark-driver/fluentd-benchmark

Fluentd is a log collector written in Ruby, which is used by so many large-scale services https://www.fluentd.org.So Fluentd might be a good real-world Ruby use case whose performance is very important.
Disclaimer: Note that I’m an employee of Treasure Data, a company building Fluentd.

I modified one_forward benchmark in https://github.com/fluent/fluentd-benchmark so that we can use it from benchmark_driver.gem. Currently it’s not on benchmark-driver.github.io because the benchmark result does not look affected by Ruby’s performance. I’m not sure why, and we need to investigate the cause to help Fluentd become faster.

Integrating derailed_benchmarks.gem

You may want to use your Rails application for a benchmark test case for Ruby 3x3. If I integrate derailed_benchmarks.gem with benchmark_driver.gem, we may be able to compare Ruby’s performance using the Rails application easily.

Recently @schneems kindly gave me a commit bit of the gem. So I’ll be able to improve the situation from both sides.

Conclusion

benchmark_driver.gem is good for many use cases of Ruby's benchmark.
Please try it and give me a feedback to achieve Ruby 3x3.

Acknowledgements

The success of this project would be impossible without help from Ruby Association. Especially Koichi Sasada, a mentor of this project, gave me many good ideas to improve it. Thank you so much.