Using Immutable Objects to Safeguard Multithreaded Applications

Baris Sayil
ELP-2018

--

An immutable object is an object whose state cannot be modified after it’s created. This means, to “modify” the data an object contains, a new instance must be created and referenced by the original object.

To give a basic example, in Java the instances of the String class are immutable:

The method toLowerCase() does not change the data “INSA” that s contains. Instead, a new String object is instantiated and given the data “insa” during its construction. A reference to this String object is returned by the toLowerCase() method. To make the String s contain the data “insa”, a different approach is needed:

Now the String s references a new String object that contains “insa”.

So, what are the advantage of using immutable objects over mutable ones? The biggest one is that an immutable object is inherently thread-safe. Let’s start with the basics and build an intuitive understanding for that sentence.

A Thread is a software term for the basic ordered sequence of instructions that can be passed through or processed by a single CPU core. Today, with the advent of multi-core processors, many computer programs use several threads to perform more efficiently. Let’s examine the single and then the multi-threaded version of a basic program:

In Python, if we want to calculate 1+1 and 1+2 and output the results we may do this:

Or, to gain some speed, we may want to take advantage of our computer’s multi-core processor and do those in different threads:

Here, we use two threads to execute concurrently. In this case it’s not a good idea since if we measure the time, it actually takes more time since the add() function is such a basic function that the process of creating two threads takes way more time than the time gained by executing it twice in parallel. Now imagine that instead of adding two single digit numbers, we multiply two 100x100 matrices. In this example using n threads would effectively divide the time needed to execute the program by n.

So what’s the problem then? Why is using immutable objects of any use? Let’s see what might happen if we concurrently update a variable from several threads. Here count is a shared variable that is shared between the threads A and B:

At first glance, it is not obvious that there is a problem here. There are only two execution paths, and they yield the same result. So if initially count was 0, after the execution of these threads, we’d expect it to be 2. You might be surprised to see it equal to 1 instead.

The problem is that these operations are translated into machine language before execution, and in machine language the update takes two steps, a read and a write. The problem is more obvious if we rewrite the code with a temporary variable: temp.

If the execution path is, for example, a1 < a2 < b1 < b2, there are no problems and count is 2 in the end.

Now consider the following execution path: a1 < b1 < b2 < a2. Because both threads read the same initial value, they write the same value. Thus, count is only incremented once, which is probably not what the programmer had in mind.

In concurrent programming the execution path is never deterministic, so even if the first hundred times the execution path is the first one and everything works, the hundred and first time it might be the second one and a bug that seemingly came out of nowhere would appear.

Python version

The classic solution to this problem is by using mutual exclusion so that only one thread can modify or read the shared variable at the same time, another solution is to make count immutable. In this case, threads A and B do not modify count but save its value to a mutable variable. They then iterate the latter and return the difference between count and the new variable to the main thread. Finally, a new immutable variable is created by summing up all three of the values: initial count, the first difference, and the second difference:

Here, i and j should be thought of as immutable.

While this is possible to implement using a mutable object as well, it is “safer” to use an immutable one to make sure the problem described doesn’t occur.

The example with add() and increment() are only symbolic. Imagine having a function with two inputs that makes a very complicated and data driven statistical analysis, the first input being the original data and the second one some parameter. Now imagine that, for our desired input, this function takes around 30 minutes to complete the operation regardless of the parameter chosen. If we want to perform this function with 4 different parameters, it would take us two hours with a single thread, or around half an hour with multiple threads. Knowing this, we’ll be tempted to opt for the multi-threaded option. Now, if the input data is mutable, there’s a good chance, even if we were very careful in coding our function, one of the threads will modify it at a bad moment and it will be corrupted like the example above. Here, the use of an immutable data structure will fix this problem and we can use our multi-threaded program in peace.

To conclude, immutable objects can be a useful tool to face up to the novel challenges that concurrent programming has given rise to and should therefore be present in every programmer’s toolset.

--

--