While I was at Square, I helped debug a lot of Ruby apps, and learned a few tricks for narrowing down what was wrong by looking at the Ruby VM.
This is the first of two posts, with this one covering the Global Interpreter Lock, thread contention, and why app metrics can lie to you. The second part will cover how Ruby’s Garbage Collector works, and why your app can slowly fragment memory despite your code not having any memory leaks.
Everything covered here is against MRI 2.4.2, I don’t know of any changes in MRI 2.5.0 that will invalidate anything here. If you’re using JRuby, then none of this applies to you, as that uses the JVM.
Global Interpreter/VM Lock
The Global Interpreter Lock (GIL), also known as the Global VM Lock (GVL) within Ruby, forces Ruby apps to run concurrently. Ruby uses a GVL because it simplifies the VM internals, since you don’t have to deal with thread safety, and also allows for greater compatibility with C libraries that aren’t thread safe.
As a refresher on concurrency vs parallelism. Concurrency is having multiple threads that are alive, but only a single one is executing at a time.
Whereas parallelism can have multiple threads that are executing simultaneously.
Internally, Ruby acquires the GVL, runs Thread #1 until the scheduler decides to pause it, releases the GVL, then reacquires it to run Thread #2, and so on. Whenever you call
IO.select (which ultimately calls
sleep), the GVL is released, and another thread can run.
Ruby 3 has a proposal for adding true parallelism through Guilds, but Ruby 3 is a few years out still. Olivier Lacan explains more on Guilds if you want to learn more.
Because of the GVL, each thread has to compete for execution time within a single Ruby process. When using a threaded web server, such as Puma, this can lead to unpredictable performance. For example,
When monitoring response times, you would see Thread #2 start to take longer to respond, when in reality Thread #1 is causing the problems. Because Thread #1 is a tight loop, the Ruby VM doesn’t have easy places to pause the thread, like you would during an IO operation. This will lead to Thread #2 taking longer to respond, even if nothing is wrong with that code.
Short of modifying the Ruby VM itself, I’m not aware of a blessed way to measure thread contention. I settled on using a thread which sleeps for 1 second, and then measures how long we were slept over that second.
The reported time is then how much time was “stolen” from our thread. If it’s only a few milliseconds, nothing to worry about, but if it’s spiking to 100s of milliseconds, then we know something is overloading the Ruby VM.
In general, thread contention issues come down to doing too much in Ruby land in a single process. The only fix is to do less, whether it’s running more processes and less threads, or optimizing your code.
One memorable case of this causing issues, was one of the gems Square used for reporting exceptions had some performance issues. Exceptions were infrequent enough that it didn’t cause a problem originally.
One day a bug made it to production which caused an exception, and was triggered frequently enough to cause thread contention issues. After that, the app went into a death spiral with requests queuing up that it would never process in time, and had to be restarted.
Also a good example for why you should ensure your app servers quickly reject requests when overloaded.
Lies and Metrics
Any kind of monitoring tool, such as NewRelic or DataDog instruments your Ruby app with a snippet like:
Because Ruby runs concurrently, any kind of timing metrics will be thrown off when thread contention happens. Here’s an annotated example of a timing metric being thrown off:
It’s important to always check metrics from multiple sources, rather than relying solely on what the app is reporting.
Why Use Threads?
Your typical app is a business logic wrapper around a database and a set of web services. If you have 30 processes and half of your time is spent waiting on an IO call, then all 30 processes spend half of their time sleeping.
With threads, you could run 15 processes with 2 threads and drop your memory usage by half. Or you could run 5 processes with 6 threads to further reduce memory usage.
Since threads are competing for resources, you can’t just configure your app to use 100 threads to serve requests. You would slow down your app during heavy load. The exact ratio of processes to threads varies on the app, and is something you can only find with experimenting.
Next up, Garbage Collection
In the last part, we’re going to cover how Ruby’s Incremental Generational Garbage Collector works, and why it requires so many words to describe. Check it out here!