“Moore’s Law is dead.”
It’s an end-of-days sort of declaration, trumpeted in the streets by tech blog and editorial alike. The vaunted Intel oracle, Gordon Moore, has finally been proven wrong, and the rest of us will have to soldier on without his guiding hand.
Moore’s famous proclamation concerning the exponential growth of transistor density really boils down to one thing: efficiency. The more components you can cram onto a circuit board, the better its performance. But as these transistors get smaller and smaller, they get more difficult (and expensive) to work with. Eventually it becomes economically infeasible to improve the component count of the chip. And that “eventually” is soon. The exact timeline is still under debate, but experts seem to agree that Moore’s Law will meet its functional end by 2025.
Barring significant advances in quantum computing or supercooling, it seems we’ve reached the lower bounds of how small and how fast our computers can get. The next innovations in efficiency won’t come from chipset manufacturers, but from the software itself.
Multiple Cores, Multiple Threads
Shopping for a computer nowadays, it may seem like Moore’s Law is hopelessly obsolete. Who cares how powerful each processor is? There are Dual Cores, quad-cores, even 8-cores!
The difference between multi-core and multi-processor computers is worthy of its own article, but the question still has merit. With multiple processors available, why should we worry about the limits of a single processor?
Modern chips are pretty smart pieces of metal. They can intelligently break down system tasks into smaller components, dividing them amongst the available processors in an essentially round-robin fashion. Each processor, given multiple tasks to perform, can then further divide those tasks via hyperthreading. The goal is to have each of your cores, or processors, sharing an equal part of the computing load.
However, the multi-core chip has its limitations. Users (meaning people, rather than the automated servers that handle web traffic, for example) tend to focus on a single program at a time. Where a server can assign equal importance to all of its incoming computing requests, a person wants their current program to be the computer’s highest priority. (Computers that don’t prioritize the user’s needs tend to find themselves thrown out of windows)
The problem is, most of the programs people want to use aren’t optimized for multi-core processing. It’s difficult for a software designer to consider running multiple tasks — or threads — in parallel, because the conventional style of programming is a sequential process. Access memory A, run function B, provide the user with result C.
Computationally intensive programs, like those for gaming or video processing, are designed to maximize the efficiency of the CPU (or GPU). These are incredibly complex pieces of software that require entire departments to design and maintain, and many of their companies are at the forefront of multi-threading research. Smaller companies, and especially individual coders, just don’t have the same resources to throw at the problem.
Does that mean efficiently using a multi-core processor is impossible for the average programmer? No. Far from it. As Moore’s Law settles into the history books, there’s growing community support (across all programming languages) for multi-threading projects that help alleviate some of the barriers to entry.
Ruby and Threading
As of version 1.9 in late 2007, Ruby has the capacity for multi-threading. With that said, the answer to the question “does Ruby have multi-threading?” is a resounding “sometimes.”
Without getting too deep into the structure of the language, Ruby relies on an interpreter to actually run its code. The creator of Ruby, Yukihiro Matsumoto, provided the first interpreter with its initial launch. Commonly called MRI (Matz’s Ruby Interpreter), it included a mechanism that makes multi-threading impossible: global interpreter lock. Its successor, YARV, which released alongside Ruby version 1.9, also included this mechanism — even though 1.9 would have otherwise supported multi-threading!
What is global interpreter lock?
In an ideal world, a complicated Ruby program would break itself into independent subtasks (defined by the developer) and provide those subtasks to the computer’s processors. The computer would divide the work as evenly as possible amongst its cores, then return the results to the program as they finished. These subtasks would all run in parallel, so theoretically a computer with four cores would process the code four times as fast.
With global interpreter lock, no matter how many threads exist, the interpreter will only execute one thread at a time (even with multiple cores available to process them).
If that seems to defeat the point of multi-core processing, you’re right — it does. But the inclusion of this lock was very intentional.
I hope to see Ruby help every programmer in the world to be productive, and to enjoy programming, and to be happy. That is the primary purpose of Ruby language.
Yukihiro Matsumoto, 2008
First and foremost, Matz created Ruby as a programming language for humans. He intended it to be clear and predictable, and avoided design patterns that favored computational efficiency over readability. And, as mentioned before, multi-threading is not an intuitive process. It may be extremely computationally efficient in a multi-core environment, but it’s difficult for humans to follow the parallel logic.
Second, Ruby supports extensions written in the C programming language — MRI itself was written in C, and occasionally gems will utilize C code when performance is crucial. However, many C extensions aren’t thread-safe. When divided and run by multiple processors, the code may not properly synchronize access to data, resulting in dreaded race conditions. Depending on which thread executes first, the program may return different (non-deterministic) results, which is very undesirable.
Finally, removal of the global interpreter lock makes Ruby slower when running a singly-threaded application. When the interpreter can safely assume that only one thread will be running at at time, it can make small optimizations to improve performance. Since most Ruby programs use only a single thread, this would mean an overall loss of efficiency for the language.
Alternatives to Multi-Threading
Even with global interpreter lock in place, there are workarounds in MRI that allow programs to be executed concurrently.
In particular, Ruby concurrency is when two tasks can start, run, and complete in overlapping time periods. It doesn’t necessarily mean, though, that they’ll ever both be running at the same instant (e.g., multiple threads on a single-core machine). In contrast, parallelism is when two tasks literally run at the same time (e.g., multiple threads on a multicore processor).
Eqbal Quran, Ruby Concurrency and Parallelism: A Practical Tutorial
The simplest way to implement concurrency in Ruby is to spawn multiple processes. The fork method will create a subprocess that avoids the tyranny of the global interpreter lock, and allow your computer’s scheduler to assign that process to a separate core.
Great! That seems to be an easy workaround. MRI may not have true multi-threading within a single process, but multiple processes are basically the same thing, right?
Well… no. As with everything in programming, there is a trade-off.
When a program is forked, it creates a subprocess known as a child. Multiple forks will generate multiple children, all with duplicates of the original data. Each child process will, ideally, operate on its own version of the data and then return the result to its parent when finished.
These children may run on different processors in parallel, but they are very memory-hungry. Remember, the original data is being fully copied for each subprocess. So as the number of children increases, the amount of memory needed for operation increases linearly.
In addition, processes are not very good at communicating amongst one another. If any data needs to be shared by the children, this creates a processing bottleneck, which translates to idle cores and inefficiency.
There are some optimizations for multi-process concurrency in Ruby — Matz himself has praised the model of the Rack server Unicorn.
Unicorn utilizes a system where subtasks are fed one-by-one into a queue. At the end of the queue are a number of parallel subprocesses, which are called workers. Each subtask, as it reaches the front of the queue, is assigned to the next available worker, which then executes it and provides the result to the original program. The model is similar to a grocery store checkout line, where customers are the subtasks and the workers are the checkout clerks.
The result is still relatively memory expensive, but it’s balanced by the computer’s scheduler and is entirely thread-safe.
To put it simply, green threads aren’t real threads.
Prior to Ruby version 1.9, the only threading the language supported was green threading — threads spawned as parts of a single thread in a single process. The operating system doesn’t see green threads as separate entities, and thus doesn’t schedule them as subtasks amongst its cores.
The Ruby Thread class is an example of green threading, where tasks gain the execution benefits of threading (timing and synchronization) without reaping the benefits of multiple processors.
Version 1.9 introduced native threads to Ruby, which are kernel (or operating system) threads. The system scheduler will recognize native threads as separate entities and appropriately schedule them on different processors. However, global interpreter lock in MRI and YARV prevents this from happening.
However, Rubyists rejoice — these are not the only Ruby interpreters.
Real Multi-Threading in Ruby
I mentioned previously that the original Ruby interpreter, MRI, was written in the C programming language. It supports C extensions and all of the thread un-safe code they introduce to Ruby. Perhaps with good reason, there are no future plans to remove the global interpreter lock in MRI.
But for real multi-threading applications, there is JRuby.
JRuby is an implementation of Ruby that runs inside the Java Virtual Machine. As such, it loses some of the native C extensions of ‘vanilla’ Ruby, but gains access to Java types and objects. Most importantly, it can also take advantage of Java Threads, which are designed for full parallel execution.
There are some excellent Ruby libraries which interface well with JRuby, allowing applications to run on multiple threads with just a few lines of code.
Puma is a Rack-based web server with both Sinatra and Rails integration. It was designed specifically for concurrent applications, and supports full multi-threading when run on a fully-threaded Ruby interpreter.
Celluloid is framework that essentially reimplements the object model of Ruby using concurrency. Objects are replaced with “actors,” which are objects specifically tied to a thread. Celluloid will organize your method calls so that your program is never deadlocked, a common error in multi-threaded applications.
Sidekiq is another framework, designed as a parallel version of the Unicorn supermarket model described earlier. Jobs are added to a queue and executed by worker threads in the background, using all of your available processors for load balancing. Sidekiq is designed to optimize Rails applications for scalability and ease of use.
Obviously the loss of C extensions means JRuby is not a cookie-cutter solution for every Ruby concurrency problem. But, in specific applications — typically relating to web servers — it does have significant advantages over its globally-locked peers.
Your own implementation will depend on your need for parallel tasks, performance, and memory efficiency. Multi-threading may not be necessary for your application, and Ruby itself may not be the best language for your work.
That being said, Ruby does have ways to achieve real parallel processing, and they are intuitive enough for even a true beginner.
With Moore’s Law slowly winding down, taking advantage of multiple processors will become a software development standard. It’s time to start teaching ourselves to embrace this parallel model.