Benchmarking and Profiling

Shreyansh Gupta
Nerd For Tech
Published in
5 min readMay 22, 2024

This guide is applicable to all languages and frameworks. Some suggestions are tailored for Ruby.

Benchmark

Benchmark is a SYNTHETIC MEASUREMENT of the RESOURCES or TIME consumed by a defined PROCEDURE.

Synthetic measurement — It does not exactly replicate the conditions in the production. It just aims to give a ballpark estimate.

Resources or time, procedure— We can get information like how much time it took to run a method, how many iterations were run per second or how many objects were created — all this by a defined procedure like a method or a controller action.

3 kinds of benchmarks with different levels of scopes —

  1. Micro — These benchmarks test small functionality. A lot of iterations are run. e.g. a single line of code run 1 million times. That line of code, might call more methods in itself. Optimizations to this single LOC might not be useful if the LOC is not being called in many places.
  2. Macro — These benchmarks test comparatively bigger functionality. Comparatively lesser iterations are run. e.g. controller actions or service objects executed 1000s of times. Optimizations here might be useful. Results can be seen in the benchmarking results.
  3. App — These benchmarks test much larger functionality. Much lesser iterations are run. e.g. entire features run 100s of times. Optimizations here might be useful. However, the effect of the optimization might not be visible in the benchmarking results since it is executing a lot of code.

Some benchmarking gems in the Ruby world are — benchmark-ips and benchmark-ipsa.

Some benchmarking parameters

  1. Warm up time — It is needed to allow the code to achieve steady state by loading the data and filling in the cache before benchmarking. Make sure to Warm up the code long enough for stable results. e.g. CRuby achieves steady state much faster than JRuby or TruffleRuby. Therefore, for the latter we might need longer warm up time.
  2. Benchmark time — It determines the times that the actual benchmark will be run for. It should be long enough to keep the variance low. If the variance is high, increasing the benchmarking time might help.

Tip: Ensure to only measure the stuff that you actually care about. Make sure you know the code that you’re benchmarking. Keep all the unneeded code outside the benchmarking block.

Tip: It’s good to check in all the benchmarks that you write in something like the/benchmarks folder so that they’re readily accessible.

Profiling

A profile is a STEP BY STEP ACCOUNTING of the RELATIVE CONSUMPTION of RESOURCES by a procedure’s many SUBROUTINES.

Step by step accounting — Instead of getting a cumulative or a single value at the end like how many iterations were run per second, we get a lot of information. e.g. we spent 10% of our time on line 1, 5% on line 2 and so on. We can get the information on a extremely granular level based on the tool.

Relative consumption — It does not give an absolute number like we spent a 100ms of time here like a benchmark would do. It gives us relative measurements like percentages.

Resources — We can profile different resources like memory or time.

Subroutines — We are going to profile a procedure’s subroutines which in the context of a programming language like Ruby might refer to methods. e.g. how much time was spent on each method within a procedure.

Profiling answers — “What’s slow?”. It tries to give you very granular details which inevitably makes the app even slower. So production profilers like New relic, Scout and Skylight will make tradeoffs to give you just enough data without impacting the app too much.

To get in depth results, we need to have a profiling environment that works almost identically to production, especially with regards to the amount of data present. Here, we can enable in depth profiling.

Below are some settings to get production like behavior out of development environment on Ruby on Rails. You might want to put these behind a conditional. e.g. if RAILS_ENV == “PROFILE”, only then apply these changes.

Settings to get production like behavior out of development ENV on Ruby on Rails

There are two modes for measuring time in almost all profilers —

  1. CPU time — This mode measures the time based on CPU cycles. This means that things like waiting on IO and sleeping don’t show up in those profiles because no time has passed relative to how many CPU cycles have occurred for this process.
  2. WALL time — This is based on the clock. So everything shows up like waiting on IO and sleeping.

Depending upon your use case, you might want to use CPU time sometimes while WALL time at other times.

There are 2 kinds of profilers —

  1. Statistical — They sample a percentage of the available stack frame. So they interrupt the process execution every X milliseconds to take a snapshot of the stack frame and then allow the process to continue. Then all of those stack frames get aggregated into a profile.
  2. Tracing — They hook into the language. Every time we call a method, it increments the relevant counters and record that the method was executed.

So tracing profilers can record everything that happens. While statistical profilers might process only, say, 1 percent of the total data.

There are 2 main time-profilers in the Ruby world that we use —

  1. Stackprof — This is statistical.
  2. Ruby-prof — This is tracing.

Profile memory in Ruby world using memory_profiler.

In the Ruby world, the best way to interact with memory_profiler and stackprof is using rack-mini-profiler.

Write performance tests in Rails using rails-perftest.

No matter how good you measure, at the end of the day, you simply cannot replicate every production scenario in non-production.

Follow these steps to measure performance in production —

  1. Read production metrics — e.g. upon investigation, you find out that the search is working too slow.
  2. Profile to find hotspots —Do profiling to find where exactly the problem is. i.e. which actions are the slowest. e.g. say the search method is the slowest.
  3. Create benchmark — Take those slow parts and create benchmarks for it. e.g. try to figure out how long does it take to render each line of the result which searching for something.
  4. Iterate — Once we benchmark and find the problem, it’s pretty straightforward to implement a solution. Iterate/Work on the slowest areas to see the improvements. e.g. say earlier you were rendering 1000 results per second. Now after implementing the solution, you can benchmark 10000 results per second.
  5. Deploy — Deploy the solution.

These notes have been prepared from Nate Berkopec’s talk. You can watch the talk here —

RailsConf 2019 — Profiling and Benchmarking 101 by Nate Berkopec

--

--

Shreyansh Gupta
Nerd For Tech

Software Engineer. I like to write about the new things I learn and find interesting. You can also find me here - https://shreyanshgupta.hashnode.dev/