Behind the Scenes: Rake Multitask

Introduction

At Optimizely, we use the open source tool Rake quite heavily. While trying to figure out how to execute blocks of code in parallel, I ended up reading into the Rake::ThreadPool class, which is pretty interesting.

What I came up with was this: https://github.com/optimizely/rake-opt-parallel. Check out the README to see what it does!

But, let’s get back to the interesting part.

Rake Tasks & Multitasks

First, let’s take a look at some rake code.

task main: [:sub1, :sub2]
task :sub1 do
puts ‘Subtask 1’
end
task :sub2 do
puts ‘Subtask 2’
end

Here, there’s a task called “main” that defines two prerequisites to be run: sub1 and sub2. When calling “rake main”, the output should look like

Subtask 1
Subtask 2

But instead of using the #task method to define main, you can also use #multitask like so

multitask main: [:sub1, :sub2]

This would call the prerequisites in parallel, meaning each task would execute concurrently.

Sidenote: since they are running concurrently, the output could be jumbled.

To understand how this works, let’s talk about Ruby’s Thread, which is what is used behind the scenes. Time to dive in.

Ruby Threading

threads = 3.times.map do |i|
Thread.new do
1+1 # Just some random calculation
puts “Thread #{i} done”
end
end
threads.each do |t|
t.join
end

Each thread is a pthread managed by ruby. What we’re doing here is we’re creating 3 of them, and saving them all into an array. Afterwards, we go through all of them and call Thread#join, which waits until the thread has finished execution, effectively blocking the main code from continuing and exiting until all the threads have finished.

We should see something like the following:

Thread 0 done
Thread 2 done
Thread 1 done

There are two important things to note here:

  1. Although there’s only 3 blocks of work to be done, we have a total of 4 threads, the 3 we created, as well as the main thread.
  2. The main thread is essentially unused while the other threads are executing

Rake does a few tricks to get around these issues

Rake::Threadpool

Rake creates a Threadpool at initialization, which creates and manages threads. The number of threads it creates is based on the number of CPUs available, but can be overwritten with a flag.

# Rake source
@thread_pool ||= ThreadPool.new(options.thread_pool_size || Rake.suggested_thread_count-1)

The threads are created an initialization and never disposed of, just reused. Also, you can see here that whatever thread count is requested, it subtracts one from the ThreadPool size. For example if the application requests 8 threads, it actually creates a ThreadPool with a size of 7.

This is possible because of Rake’s idea of “Promises”

Rake::Promise

A promise is a block of work to be done. When we were creating threads in my previous example, we were telling the threads what to do when we created them. Rake instead essentially creates threads that are idle, and waiting on work to be done. It thinks of the work to be done, and the Thread that actually does it as separate entities.

To see how this works, we’ll take a look at the Rake::Task definition.

209 # Invoke all the prerequisites of a task.
210 def invoke_prerequisites(task_args, invocation_chain) # :nodoc:
211 if application.options.always_multitask
212 invoke_prerequisites_concurrently(task_args, invocation_chain)
213 else
214 prerequisite_tasks.each { |p|
215 prereq_args = task_args.new_scope(p.arg_names)
216 p.invoke_with_call_chain(prereq_args, invocation_chain)
217 }
218 end
219 end
220
221 # Invoke all the prerequisites of a task in parallel.
222 def invoke_prerequisites_concurrently(task_args, invocation_chain)# :nodoc:
223 futures = prerequisite_tasks.map do |p|
224 prereq_args = task_args.new_scope(p.arg_names)
225 application.thread_pool.future(p) do |r|
226 r.invoke_with_call_chain(prereq_args, invocation_chain)
227 end
228 end
229 futures.each(&:value)
230 end

When a task is called, the prerequisites are invoked. In the case of a normal task, you can see in line 214 that it just iterates through each prerequisite, and invokes it sequentially. However, in the case of a multitask, it defers to the #invoke_prerequisites_concurrently, which maps each prerequisite to a “future” using the thread_pool. A future is an instance of a Promise, which rake uses interchangeably (I’m not sure why they use two names, but trust me, they’re the same thing).

If you compare this to the Thread code we wrote, it looks very similar. For each block of work, map it, then iterate through them. The key difference is that when we iterate through them, we called #join on the threads, and with the futures/promises, we’re calling #value.

Recalling what Thread.join does:

  1. If it is not finished executing — wait for the Thread to finish,
  2. If it is finished executing — receive the return value,

When we use Promise.value, it has 3 paths instead:

  1. If it is not finished executing and a Thread is currently executing it — wait for the promise to finish
  2. If it is not finished executing and a Thread is not currently executing it — the main Thread will pick up the work
  3. If it is finished executing — receive the return value

By separating the work, and the worker, it allows the opportunity for a new path, having the thread that is calling futures.each(&:value) to possibly pick up the work itself, instead of just waiting for each piece of work to finish.

Why is this useful?

I’m sure you’re thinking, “Having one extra thread isn’t really a noticeable performance gain”. Let me tell you, you’re absolutely right. But it’s pretty cool right?

Just kidding, Promises are actually very useful.

One thing to consider is when rake is executing a tree of nested tasks.

multitask main: [:a1, :a2]
multitask a1: [:b1, :b2]
multitask a2: [:b3, :b4]
multitask b1:[ :c1, :c2]
multitask b2:[ :c3, :c4]
multitask b3:[ :c5, :c6]
multitask b4:[ :c7, :c8]
# This part is pseudocode, but you get what I mean
task c1…c8 do
something
end

Here we’re executing a task, that has a tree of prerequisites, with most of them just calling other tasks. If we wanted to have tasks c1..c8 execute at the same time and we were using normal Thread.join logic, each multitask would have it’s own Thread dedicated to just waiting for it’s children to finish. That’s a total of 15 threads for 8 tasks! Nearly a 50% increase in this case.

Using Rake::Promise, since a thread doesn’t need to be used for waiting on it’s children, we would only need the 8 threads for the 8 tasks.

Another thing to consider, even if the number of threads used by each were equivalent, is the cost of thread context switching. If a parent thread calls #value on a promise, it has the chance of picking up the work instead of waiting. Which means it can keep going instead of switching threads by waiting. Each thread context switch is CPU time that is wasted. While this may not matter much with large workloads, it can make a noticeable difference when running many small workloads. I’ll be talking more on this in a later blog post!

Anyways, I hope you enjoyed going on this adventure with me. If you have any questions, feel free to contact me at jsm@optimizely.com

Also, we’re hiring! Check out our careers page for openings.