RxJava2 Schedulers #1: Demystifying the Computation Scheduler

For those who have used RxJava/RxJava2 for concurrency, you may have come across the Schedulers class. A brilliantly crafted abstraction of the Java threading primitives that allows users to move work off the main thread to a set of predefined Scheduler threads.

So what is the Scheduler class all about?

The Scheduler abstraction introduced by RxJava simply involves a specified set of rules that the underlying Java threading primitive e.g. ExecutorService or ScheduledThreadPoolExecutor must adhere to. For each RxJava scheduler, we get a different set of rules placed on these Java threading primitives. The outcome results in the characteristic schedulers we have all grown to know and love, these are:

Schedulers.computation();
Schedulers.io();
Schedulers.newThread();
Schedulers.trampoline();
Schedulers.single();

The Schedulers implementation is quite extensive so to narrow down the scope, we shall pay attention to the computation scheduler.

So what is the computation scheduler? What makes it unique? I mean if all these schedulers are based on thread primitives, why should there be a difference between Schedulers.io() and Schedulers.computation(). Aren’t all threads created equal? What makes one more preferable for IO and the other for computational tasks?

The goal of this article is to hopefully answer these questions. So let’s dive right in!

Computation & The Computation Scheduler

Before we begin with the scheduler, let’s define what a computational task is. After scouring the internet, Wikipedia gave the most comprehensive answer. A computational task is defined as any type of calculation that includes both arithmetical and non-arithmetical steps and follows a well-defined model e.g. an algorithm.” So basically when it comes to computation, think about algorithms and math. We all know CPUs are brilliant at solving mathematical/structured problems that involve complex logic and multiple recurring steps. Peter Dunn notes in an article in the MIT School of Engineering that complex mathematical problems are broken down into basic operations such as addition, subtraction, multiplication, and division, which are considered “bread-and-butter tasks” for a microprocessor.

The computation scheduler or the Java threading primitive associated with the computation scheduler is rather unique in its ability to be very good at handling computational tasks. Since we now know what computation is, we need to ask ourselves why the computation scheduler is best suited to handling computation.

How the Computation Scheduler works

According to the RxJava2 documentation the default computation scheduler “has a backing pool of single-threaded ScheduledExecutorService instances equal to the number of available processors to the Java VM”. So if your device has 4 processors, the computation scheduler spools up about 4 threads, 1 on each core to deal with the computation.

You’re at this point probably wondering why you cant get more threads? I mean with other Java threading primitives e.g the ThreadPoolExecutor you can spool up as many threads as you want! Wont more threads make the computation faster?

The general answer is no , and the reason behind that can be found in an article written by Arch Robison, on why too many threads hurts performance. In short, there are two kinds of threads, hardware threads and software threads. When designing CPUs, CPU manufacturers dedicate a set thread to be the workhorse for a CPU, (sometimes there can be more than one dedicated thread to the CPU, e.g. Intel Hyper-Threading Technology). These kind of threads are actual hardware resources and are referred to as the hardware threads. Software Threads on the other hand, are generated by programs and these can vary based on the programs need, these are the threads we as developers pretty much always deal with. Robison notes that when the number of software threads exceeds the number of hardware threads (i.e CPUs), the CPU scheduling system has to step in and schedule each software thread to run on the hardware thread using time slicing mechanisms. This means each thread gets an allocated amount of time to use the hardware thread, once the allocated time expires, the software thread is suspended and the slot is made available to the next waiting thread. The starting and stopping of threads creates overhead that would otherwise be best spent computing a task without any interruption.

Therefore, when we have a fixed amount of work (e.g finding a solution to the Travelling Salesman Problem) that is distributed among several threads, the overhead involved with starting and stopping all these threads will surpass the amount of useful work actually being done. As a result, the outcome is disadvantageous. Think of:

  • The memory used to create the extra threads.
  • The memory and compute needed to schedule the created threads.
  • The compute needed to synchronize the several thread’s access to shared memory.

As a result, more threads in this case does not work when it comes to computation.

Okay, I understand that more threads means more overhead! But why a single thread per core?

CPU Bound Work

To understand the nuances behind the computation scheduler, one must understand what CPU bound work is. CPU bound work is any type of work whose efficient completion is bound by the hardware characteristics of your CPU. This means any bottleneck that is caused in processing such work will only come from the compute capabilities of the CPU. Jesse Storimers notes that any computational work that is split across 4 cores running a single thread each, ends up utilizing all your cores to the max without incurring any overhead. Having more than one thread per core will still allow you to push your cores to the max, however you will begin incur the overhead mentioned above.

So what does all this mean, should I restrict computational work to the computation scheduler? The short answer is yes, that would be the recommended step (see RxJava2 Computation Docs). However it is important to remember the code we write for our programs tends to be dynamic, sometimes involving some form of computational work, mixed in with some IO or completely different work. As a developer it is important to either plan how you will approach structuring your code or better yet, profile your code against a variety of Scheduling mechanisms, tweaking factors like threadcount or awake time etc, in order to get the right mix for that kind of work. As with most programming techniques, there is no silver bullet.

RxJava Operators that default to the computation scheduler

Now that we know what the computation scheduler is and why it functions they way it does, we can conclude by identifying the RxJava operators that execute on the computation scheduler by default. Examples of these are:

Static: interval(), timer(), timeout()
Instance: buffer(), delay(), take(), skip(), takeWhile(), skipWhile(), timeout() and window().

Some of these operators often give an extra overloaded method that takes in the specified Scheduler you would prefer to use. So you are not limited to the computation scheduler

Some of these operators often give an extra overloaded method that takes in the specified Scheduler you would prefer to use. You are therefore not limited to the computation scheduler (see Figure 1).

Figure 1: Extra overloaded method to specify the Scheduler you want.

Here are some examples of the above operators in use.

Delay Operator

The delay operator using the computation scheduler by default

This is what RxJava2 is doing with the delay operator.

(Extracted from RxJava2 docs)
@CheckReturnValue
@SchedulerSupport(SchedulerSupport.COMPUTATION)
public final Observable<T> delay(long delay, TimeUnit unit) {
return delay(delay, unit, Schedulers.computation(), false);
}

Notice the direct call to an overloaded delay() that explicitly calls Schedulers.computation().

Timer Operator

The timer operator using the computation scheduler by default

Interval Operator

The interval operator using the computation scheduler by default

Observers

These are the observers used in the above examples

Summary

  • The Computation scheduler is an abstraction over Java threading primitives such as the ExecutorServicethis scheduler has a specified set of rules that sets it apart from its other scheduler counterparts e.g. Scheduler.io() or Schedule.newThread() etc
  • The Computation scheduler uses a limited number of threads. This limit is tightly associated with the number of CPU cores you have.
  • Too many threads create scheduling overhead that interrupts useful computational work done by the processor, as a result the computation scheduler will pretty much only use one thread per CPU core to avoid overhead.
  • For most tasks that do not involve IO, or several blocking operations, the computation scheduler should suffice.
  • Certain Observable operators such as delay(), interval(), timer() will utilize the computation scheduler by default.

Thanks again for reading, catch me next time as I dive into IO bound work and the IO scheduler!

Twitter: @martinomburajr

Email: info@martinomburajr.com