Concurrency is hard: it’s not straightforward when to worry about it. In other words, there’s no other shortcut than thinking hard about how the code will be run. For sure, concurrency problems are possible whenever a race condition could happen. Which means, whenever two or more threads access the same data at the same time and at least one of the accesses is a write.
The problem with concurrency is that thread scheduling is nondeterministic. In fact, whenever threads are spawned, there’s no way to know in which order they are going to be executed by the CPU. That’s why concurrency issues show up at random times. And they always choose the wrong moment to do so.
If we forget about the randomness, a concurrency issue is like any other bugs. Thus, we follow the same steps to fix it.
1. Write a (consistently) failing test
First of all a failing test is written. It should be able to provide a consistent feedback taking away the randomness. One easy way to achieve it is run the check multiple times in the test. Another could be spawning a huge amount of threads to make it more likely to trigger the issue.
What follows is part of a Ruby script I’ve used lately to check a web application:
concurrent_post is spawning
number_of_threads to concurrently POST to
read_url just exposes the state of the system so that the behaviour can be checked.
For example, let’s say we are testing a web application which is keeping a counter: each POST request increments it by 1. Therefore, after
number_of_threads requests the counter should be equal
number_of_threads. If that’s not the case, there’s a concurrency issue.
2. Write some code
Now that we can exploit the feedback from the test, it’s time to experiment with a fix. The first thing to do should be separating business logic from concurrency logic: they are two different responsibilities and should live separately. This way each one will expose the right abstraction without any accidental complexity creeping in.
Secondly, some sort of locking or concurrency model can be applied to attempt fixing the issue.
3. Test again
Lastly, we run the test again. In case of failure going back to 2. is the thing to do. In case of success pat yourself on the shoulder but watch out. This just means you are in the green not that all problems are solved. In fact, there could be other issues your test didn’t highlight. I said so: concurrency is hard!
I believe it can beneficial to see some more examples of concurrency issues to be more receptive whenever investigating possible bugs. Here’s a few scenarios:
- exclusive read: each thread must get a unique item from a collection of items;
- exclusive create: in case more than one thread try to create the same resource just one must succeed;
- last update wins: in case more than one thread are trying to update multiple parts of the system, only the updates done by the last one should succeed (i.e. the other threads’ updates should be discarded);
- read and write via GET and POST: in case more than one thread requests a resource (i.e. GET) for a subsequent update, the first write (i.e. POST) request must succeed and the subsequent ones must fail;
- read and write via POST: in case more than one thread read a resource to make a write during a single POST request, all of them should succeed (e.g. the counter from before).
Get the latest content via email from me personally. Reply with your thoughts. Let’s learn from each other. Subscribe to my PinkLetter!