Concurrency on the JVM is complicated

8 min readApr 19, 2019

Fast programs have the ability to perform many tasks concurrently, which provides high performance and responsiveness. For instance, a GUI task could be running at the same time with the background processing tasks that will interact with the Database, send messages to different services and perform calculations, all at the same time.

As programmers, we’d love to write high performance and responsive programs that achieve a good user experience. To get there, we have to solve hard concurrency challenges in our daily programming life.

In this blog post, you will learn the reason why concurrency is complicated.

First of all let’s understand the difference between Parallelism and Concurrency.

Concurrency

“ Concurrency refers to the ability of different parts or units of a program, algorithm, or problem to be executed out-of-order or in partial order, without affecting the final outcome.” 😎 — Wikipedia

But

Concurrency happens whenever an application is executing multiple services, tasks, components in a program at the same time, where access to shared resources needs to be coordinated for well-defined behavior. 😭

Concurrency provides responsiveness.

Parallelism

Parallelism is about doing many things in parallel, at the same time. Parallelism is the way to use the full capabilities of multi-core hardware.

In order to use parallelism, we can use Threads, Thread Pool, Executor Service… and to be useful they should use more than 1 CPU core.

Parallelism provides high performance.

→ Concurrent programs take advantage of parallel hardware.

Concurrent programs can execute tasks in parallel, where each task is executed separately in different threads.

A thread, meanwhile, is a linear flow of execution that can be managed independently by a Scheduler, which is a part of the Operating System that decides which process runs at a certain point in time. The time each thread receives is non-deterministic, which makes concurrency tricky.

Let’s understand that by example.

Example: Assuming we have 2 iterations and 2 Threads

def task(n: Int) = new Thread(() => (1 to 2).foreach(i => 
  println(s"Iteration: $i ** Task-$n")))

task(1).start()
task(2).start()

When we execute the code:

First time

Iteration: 1 ** Task-1
Iteration: 1 ** Task-2
Iteration: 2 ** Task-2
Iteration: 2 ** Task-1

Second time

Iteration: 1 ** Task-1
Iteration: 2 ** Task-1
Iteration: 1 ** Task-2
Iteration: 2 ** Task-2

As we see, every time we run the code, we get a different execution order.

We can check the number of instructions and the number of Threads in our program (but keep in mind every statement in a program is translated into many CPU instructions). Then we can compute the number of possible execution paths.

For N instructions and T threads there are N * T steps, there is a context switch between the T threads on each step.

The number of possible execution paths is:

In this example we have 2 Instructions and 2 Threads so we can have (24 /4 = 6) 6 possible results: I1T1I2T2, I1T2I1T2, I1T2I2T1, I2T1I1T2, I2T1I2T1, I2T2I1T1

If the concurrent tasks are independent of each other, the out-of-order execution will not affect the final result.

But imagine if we deal with mutable shared variables! The coordination of simultaneous instructions that read and update shared mutable state is not guaranteed to have a well-defined result.

“The devil is in mutable state” 😈

Possible issues

Visibility

Here’s simple example of visibility:

var loop: Boolean = true

val load = new Thread(() => while (loop) println("loading...."))
val stop = new Thread { () => 
  println("stop")
  loop = false
}

load.start()
stop.start()

Execution

When we run the program we can see that it takes time for load thread to see the change in stop thread:

loading....
loading....
stop
loading....

This is because of the visibility problem:

Executing the load/store in memory is done to exchange the information between caches. Stores are more expensive than loads! the results have to be shared between other cores, and this takes time, and in our case we encountered an infinite loop because the load thread is still not able to see the update of the thread stop and the load thread keeps running because of the visibility problem.

As we see, multiple threads don’t necessarily have the same view of memory.

Solution: 👀

Use the keyword @volatile that will commit the store operation before executing any further loads. So now we would like to have:

@volatile var loop: Boolean = true

Any update done by any thread in its local cache will be visible to other threads when it reads the value, because the shared cache will refresh the value with the new value, and the local cache of other threads will load the value from the shared cache.

Race conditions

“A race condition is a special condition that may occur inside a critical section. A critical section is a section of code that is executed by multiple threads and where the sequence of execution for the threads makes a difference in the result of the concurrent execution of the critical section.” — from http://tutorials.jenkov.com/java-concurrency

Here is a simple example:

object JVMHotel { 
 var nbRequests = 0
 var nbRooms = 300 def request() = {
  nbRequests += 1
  book()
 }

Every request for booking a room in the JVM Hotel will be performed concurrently, and the JVM will schedule the threads. It’s possible that the execution of the instructions of those concurrent threads are interleaved, which could result in the wrong number of requests.

The first part of the simple concurrent code already has a problem:

It’s possible that the instructions of those threads are interleaved by the Scheduler so we might have a wrong number of requests:

First part of the simple concurrent code already has a problem:

Here we have two requests, and therefore, two tasks that have access to a shared variable nbRequests. The expected result of the number of the request is 2, but if you run it, you will find that we got 1! 🤷‍♀️ (In fact, you can get the correct value, but only if you are lucky!)

@volatile modifier will not solve the problem in this case, because both Thread 1 and Thread 2 will read the current value (which is 0), and when they compute the new value as 1 plus the old value, both threads will have the same old value (which is correct). Then they will each separately set the new value, and one of them will clobber (overwrite) the other.

Solution: 🔁

This is a synchronization problem. The critical section in that first part of request() is when we increment the value of the nbRequests, In order to solve this problem, we can make our counter an AtomicInteger, which has thread-safe atomic operations.

object JVMHotel {
  var nbRequests: AtomicInteger = new AtomicInteger(0)
  var nbRooms = 300  def request(): Unit = {
   nbRequests.incrementAndGet()
   book()
}

Here incrementAndGet will atomically increment by one and return the current value.

→ If the shared mutable state is not a counter you can use AtomicReference

Now let’s see the problem in book()

def book()=
  if (nbRooms > 0) {
    nbRooms -= 1
    Right("Success!")
  } else Left("Sorry! no rooms in the JVM")

As I mentioned, it is possible that the scheduler interleaves the threads. In this case, the critical section starts from the condition if (nbRooms > 0). Imagine if we have 2 requests and there is only one available room! We might get a successful result because the condition of the second thread is evaluated before the other thread updates the number of available rooms.

Solutions:

make nbRooms an AtomicInteger and use compareAndSet in a loop: 🌀

var loop: Boolean = true
while (loop) {
  val current = nbRooms.get()
  if (current > 0) {
    if (nbRooms.compareAndSet(current, current - 1)){    
      Right("Success")
      loop = false  }
  } else Left("Sorry! no rooms in the JVM")
}

use synchronized
use locks

In the current implementation when there is no room the request will fail, what if we want to retry later until some duration? That would be more tricky 😔

As I said, concurrency is complicated.

Deadlock

We Run into deadlock if two or more threads are waiting on each other for some action or resource and this will happen if we have more than 1 lock, where different threads acquire one lock and then wait on another lock.

Summary

There are many other concurrency challenges that we can face, but it’s important to understand the reasons and the causes. I tried to simplify the examples to show clearly how we could have critical sections in our concurrent code, and how the interleaved execution of instructions from different threads can affect the computation.

If you’re using Scala and you started worrying how you can deal with the deadlock and race conditions problems, you can use the ZIO STM data type. ZIO STM enables you to atomically perform a series of reads and writes on shared mutable states with more confidence, with very simple code that you can reason about. This lets you focus on the business problem instead of the tricky mistakes that could happen during concurrent execution.

And this is how we can implement the request system using ZIO STM.

If there is no available room the transaction will wait and retry whenever the number of available rooms will change and once the condition will be satisfied the number of rooms will be decremented.

for {
  nbRooms    <- TRef.makeCommit(300)
  nbRequests <- TRef.makeCommit(0)
  _          <- (handleRequest *> request(nbRequests, nbRooms).commit).fork
} yield ()

def request(tRequests: TRef[Int], tRooms: TRef[Int]) =
  for {
    _       <- tRequests.update(_ + 1)
    nbRooms <- tRooms.get
    _       <- STM.check(nbRooms > 0)
    _       <- tRooms.update(_ - 1)
  } yield ()

And if you want to specify a timeout of the retry you can add this:

handleRequest *> request(nbRequests, nbRooms).commit.timeout(3.minutes)

And you can do more!

If you’re interested to learn more about concurrency and ZIO-STM, you can watch this talk by John De Goes and me at Scalar Conference 2019. 👋 ❤️

A huge thanks to John De Goes for his time and his effort.