Concurrency Visualized — Part 3: Pitfalls and Conclusion

Besher Al Maleh
8 min readJan 29, 2020

--

Thanks to Pablo Stanley for these amazing illustrations!

This is part 3 of my concurrency series. Check out Part 1 and Part 2 if you missed them.

In my earlier discussion of sync, async, serial, and concurrent, I alluded to some pitfalls that you might encounter while working with concurrency. That’s our main topic for this article. Afterwards, I will wrap up this series with a summary and some general advice.

Pitfalls

Priority Inversion and Quality of Service

Priority inversion happens when a high priority task is prevented from running by a lower priority task, effectively inverting their relative priorities.

This situation often occurs when a high QoS queue shares a resources with a low QoS queue, and the low QoS queue gets a lock on that resource.

But I wish to cover a different scenario that is more relevant to our discussion — it’s when you submit tasks to a low QoS serial queue, then submit a high QoS task to that same queue. This scenario also results in priority inversion, because the high QoS task has to wait on the lower QoS tasks to finish.

GCD resolves priority inversion by temporarily raising the QoS of the entire queue that contains the low priority tasks which are ‘ahead’ of, or blocking, your high priority task. It’s kind of like having cars stuck in front of an ambulance. Suddenly they’re allowed to cross the red light just so that the ambulance can move (in reality the cars move to the side, but imagine a narrow (serial) street or something, you get the point :-P)

To illustrate the inversion problem, let’s start with this code:

We create a starter queue (where we submit the tasks from), as well as two queues with different QoS, then we dispatch tasks to each of these two queues, each task printing out an equal number of circles of a specific colour (utility queue is blue, background is white.)

Because these tasks are submitted asynchronously, every time you run the app, you’re going to see slightly different results. However, as you would expect, the queue with the lower QoS (background) almost always finishes last. In fact, the last 10–15 circles are usually all white.

No surprises there

But watch what happens when we submit a sync task to the background queue after the last async statement. You don’t even need to print anything inside the sync statement, just adding this line is enough:

Priority inversion

The results in the console have flipped! Now, the higher priority queue (utility) always finishes last, and the last 10–15 circles are blue.

To understand why that happens, we need to revisit the fact that synchronous work is executed on the caller thread (unless you’re submitting to the main queue.) In our example above, the caller (starterQueue) has the top QoS (userInteractive.) Therefore, that seemingly innocuous sync task is not only blocking the starter queue, but it’s also running on the starter’s high QoS thread. The task therefore runs with high QoS, but there are two other tasks ahead of it on the same background queue that have background QoS. Priority inversion detected!

As expected, GCD resolves this inversion by raising the QoS of the entire queue to temporarily match the high QoS task; consequently, all the tasks on the background queue end up running at user interactive QoS, which is higher than the utility QoS. And that’s why the utility tasks finish last!

Side-note: If you remove the starter queue from that example and submit from the main queue instead, you will get similar results, as the main queue also has user interactive QoS.

To avoid priority inversion in this example, we need to avoid blocking the starter queue with the sync statement. Using async would solve that problem.

Although it’s not always ideal, you can minimize priority inversions by sticking to the default QoS when creating private queues or dispatching to the global concurrent queue.

Thread explosion

When you use a concurrent queue, you run the risk of thread explosion if you’re not careful. This can happen when you try to submit tasks to a concurrent queue that is currently blocked (e.g. with a semaphore, sync, or some other way.) Your tasks will run, but the system will likely end up spinning up new threads to accommodate these new tasks, and threads aren’t cheap.

This is likely why Apple suggests starting with a serial queue per subsystem in your app, as each serial queue can only use one thread at a time. Remember that serial queues are concurrent in relation to other queues, so you still get a performance benefit when you offload your work to a queue, even if it isn’t concurrent.

Race conditions

Swift Arrays, Dictionaries, Structs, and other value types are not thread-safe by default. For example, when you have multiple threads trying to access and modify the same array, you will start running into trouble.

There are different solutions to the readers-writers problem, such as using locks or semaphores, but the relevant solution I wish to discuss here is the use of an isolation queue.

Let’s say we have an array of integers, and we want to submit asynchronous work that references this array. As long as our work only reads the array and does not modify it, we are safe. But as soon as we try to modify the array in one of our asynchronous tasks, we will introduce instability in our app.

It’s a tricky problem because your app can run 10 times without issues, and then it crashes on the 11th time. One very handy tool for this situation is the Thread Sanitizer in Xcode. Enabling this option will help you identify potential race conditions in your app.

This option is only available on the simulator

To demonstrate the problem, let’s take this (admittedly contrived) example:

One of the async tasks is modifying the array by appending values. If you try running this on your simulator, you might not crash. But run it enough times (or increase the loop frequency on line 7), and you will eventually crash. If you enable the thread sanitizer, you will get a warning every time you run the app.

To deal with this race condition, we are going to add an isolation queue that uses the barrier flag. This flag allows any outstanding tasks on the queue to finish, but blocks any further tasks from executing until the barrier task is completed.

Think of the barrier like a janitor cleaning a public restroom (shared resource.) There are multiple (concurrent) stalls inside the restroom that people can use. Upon arrival, the janitor places a cleaning sign (barrier) blocking any newcomers from entering until the cleaning is done, but the janitor does not start cleaning until all the people inside have finished their business. Once they all leave, the janitor proceeds to clean the public restroom in isolation. When finally done, the janitor removes the sign (barrier) so that the people who are queued up outside can finally enter.

Here’s what that looks like in code:

We have added a new isolation queue, and restricted access to the private array using a getter and setter that will place a barrier when modifying the array.

The getter needs to be sync in order to directly return a value. The setter can be async, as we don’t need to block the caller while the write is taking place.

We could have used a serial queue without a barrier to solve the race condition, but then we would lose the advantage of having concurrent read access to the array. Perhaps that makes sense in your case, you get to decide.

Conclusion

Thank you so much for reading this series! I hope you learned something new along the way. I will leave you with a summary and some general advice

Summary

  • Queues always start their tasks in FIFO order
  • Queues are always concurrent relative to other queues
  • Sync vs Async concerns the source
  • Serial vs Concurrent concerns the destination
  • Sync is synonymous with ‘blocking’
  • Async immediately returns control to caller
  • Serial uses a single thread, and guarantees order of execution
  • Concurrent uses multiple-threads, and risks thread explosion
  • Think about concurrency early in your design cycle
  • Synchronous code is easier to reason about and debug
  • Avoid relying on global concurrent queues if possible
  • Consider starting with a serial queue per subsystem
  • Switch to concurrent queue only if you see a measurable performance benefit

I like the metaphor from the Swift Concurrency Manifesto of having an ‘island of serialization in a sea of concurrency’. This sentiment was also shared in this tweet by Matt Diephouse:

When you apply concurrency with that philosophy in mind, I think it will help you achieve concurrent code that can be reasoned about without getting lost in a mess of callbacks.

If you have any questions or comments, feel free to reach out to me on Twitter

Besher Al Maleh

Thanks for reading. If you enjoyed this article, feel free to hit that clap button 👏 to help others find it. If you *really* enjoyed it, you can clap up to 50 times 😃

Check out some of my other articles:

Further reading:

WWDC Videos:

--

--