Concurrent Sudoku Solver: Part 2 - Using Cats Effect Ref + Deferred + IO.race

15 min readMay 8, 2022

Introduction

In Part 1, we introduced the Single Candidate technique and defined some general classes and methods to model our Sudoku domain. In this post, we will use Ref, Deferred, and IO.race from Cats Effect to implement our first concurrent Sudoku Solver implementation. Don’t worry if you don’t know what each of these do yet, we’ll introduce them as we go. Same as before, you can find all the code we’ll be writing today in the concurrent-sudoku-solver repo. Without further ado, let’s get started!

Cell abstraction

First, let’s give our problem some structure by defining what a Cell is in abstract terms:

Here we’ve defined Cell as a data structure consisting of 4 members:

coord which models the co-ordinate the Cell represents.
deferredValue which is typed as a Deferred[IO, Value]. What exactly is a Deferred? If you’ve ever used something like a Promise in another language/framework, you’ll find Deferred a familiar concept. In the Cats Effect documentation, Deferred is described as:

“A purely functional synchronization primitive which represents a single value which may not yet be available. When created, a Deferred is empty. It can then be completed exactly once, and never be made empty again.”

In our case, we are mostly interested in using this data type to model the idea that a Cell will eventually have a Value, once we’re able to logically deduce it. This deferredValue is going to play a big role in enabling ourCells to “react” whenever one of their peers are solved, and we’ll talk more about this shortly.

One thing you might have noticed about our deferredValue is we added a protected[this] access modifier. This ensures other Cells are not able to directly access the deferredValue. Specifically, we want to to ensure that no other Cell is able to .complete the deferredValue. We’ll see later in the solve method that each Cell is responsible for completing its own deferredValue.
However we do want other Cells to be able to read each other’s deferredValue so this line of code exposes a specific getValueAPI to enable exactly that. All we’re doing here is calling .get on the underlying Deferred. This is a blocking action; any fibers attempting to read the Value inside the Deferred will be blocked until the Deferred is completed. The term “blocking” might set off alarm bells in your head but there’s nothing to worry about here, .get is only semantically blocking in the sense that no actual threads are blocked so you’re not wasting compute power unnecessarily. This behaviour is similar to IO.sleep and other such async operations in Cats Effect.
Next we have the deduceSingleCandidate method which is expected to return an IO that models the process of deducing the Value of this Cell. To that end, this method accepts one further piece of information; allCells which is a reference to all 81 Cells in the Sudoku board which we’ll need to be able to get the Value of each of our peers (via their .getValue methods!)
Finally we have the solve method which is expected to do two things:
a) Return this Cell’s Value and
b) Complete this Cell’s deferredValue. Completing each Cell’s deferredValue is an important part of our program. If we don’t complete the deferredValue, it will remain empty forever, and any parts of our program that have called .get on that deferredValue will be blocked forever. In contrast, completing the deferredValue immediately unblocks any fibers attempting to read the deferredValue. This is why Deferred is often referred to as a “synchronization” primitive; it allows us to synchronize the timing of the execution of multiple fibers.

We can already fully define the solve method for all implementations, as follows.
We first check if the coord of this Cell is in the givens, in which case we already know it’s value and we can simply .complete this Cell’s deferredValue using it.
Otherwise, we use the deduceSingleCandidate method to determine the Cell’s Value and use the result of that process to .complete the deferredValue.

As you can see, a Cell as we’ve defined it isn’t a particularly complicated data structure. It’s mostly just a wrapper around Deferred. To write a concrete implementation of Cell, the bulk of the work is going to be in writing the deduceSingleCandidate method, which we will do in a moment. First however, let’s see how we will actually use this Cell abstraction inside the solve method of our Solver implementation which we’re going to unimaginatively call CatsEffectDeferredRefRaceSolver:

We create an instance of Cell for each coordinate on the Sudoku board (assume for now that we have a working Cell.make method that effectfully returns a concrete implementation of our Cell trait for this purpose).
This is just a simple transformation of our givens from a List into a Map structure, which we will need for the next step.
Now, we call .solve on each Cell we created in step 1, and we perform this action on each Cell concurrently using the parTraverse combinator. .parTraverse is one of several IO combinators in the Cats Effect toolkit that allow us to express control-flow semantics (and we will see another one in action later). parTraverse takes a list of IO and runs them all concurrently, terminating only when all the IO in the list have terminated. This particular .parTraverse call is what makes our Sudoku Solver a Concurrent Sudoku Solver, and while we’re far from done yet, that is the whole point of this blogpost!

Notice we are deliberately saying concurrently, rather than in parallel, as there is in fact a distinction between the two and we’re only really guaranteed the former. Our use of .parTraverse is merely declaring to the underlying runtime that these IO are independent and can be evaluated in parallel. Whether or not they are actually executed in parallel is up to the runtime (for example, Cats Effect won’t run anything in parallel if it’s being executed in a single-threaded language like JavaScript via Scala.js because it obviously can’t).
Finally, we return our list of solved Values.

Hopefully at this point you understand the big picture of what we’re trying to do. Now all we need to do is figure out how to actually write a valid implementation of our Cell trait and have the Cell.make method return that.

Cell implementation

We start by creating an instance of a Deferred. Note that this is an effectful action, and this is why our make constructor method returns an IO[Cell] instead of just a Cell.
We assign to coord the _coord constructor argument.
We assign deferredValue the instance we created in step 1.
Now we get to deduceSingleCandidate. We start by creating a Ref. This is our first encounter with this concurrency primitive so let’s introduce it. Cats Effect documentation describes it as a construct that “provides safe concurrent access and modification of its content.” For our use-case, we’ll be using a Ref to store our state of available candidates for each Cell. Over the course of the solving process, we will “mutate” this state as we eliminate available candidates using the Single Candidate technique. As we’ll see shortly, we will actually have multiple processes accessing and modifying this Ref instance during the solving process so the “safe concurrent access” aspect of a Ref is also important to us.

To create a Ref, we need to give it an initial value. In our case, we can provide this initial value using the Candidate.initial method we defined in Part 1. This returns a Candidate.Multiple(Set(1, 2, 3, 4, 5, 6, 7, 8, 9)). Note that, similar to Deferred, creating a Ref is also an effectful action. Our task now is to refine the Candidate.Multiple inside our Ref into a Candidate.Single which brings us to the next line of code.
We filter allCells for only cells that are peers of this cell. This isn’t strictly necessary but it’s a small optimization we can make as we know that the Value of each cell is determined solely by it’s peers when using the Single Candidate technique.
Then we take each peer and create an IO using a refineToSingleCandidateOrNever function. We’ll implement this function shortly but let’s at least define the behaviour we expect from this function here. This function will attempt to use each peer’s value to refine the Ref of available candidates. After refinement, if we discover that only one candidate is left, this function will return that single candidate. If however, the peer’s Value does not refine the number of candidates down to just a single candidate, the function will simply never return anything i.e. it will never terminate. This is why we’ve named the function refineToSingleCandidateOrNever.

So what you will have at the end of this .map operation is a List consisting of 20 IO[Candidate.Single] values (since each cell has exactly 20 peers) but only one of these IO will actually eventually return the Candidate.Single Value of this Cell. The remaining 19 IO will simply never terminate. The question for us is how do we convert this List[IO[Candidate.Single]] into an IO[Candidate.Single]which is guaranteed to terminate and return a Candidate.Single Value? This brings us to the next line of code.
You might be tempted to use parTraverse again which will run each IO[Candidate.Single] in our list concurrently. However, since all but one of the IO in the List will never terminate, the resulting IO from .parTraverse itself will never terminate either. We could do something hack-y like add .timeout calls but this is obviously not optimal.

Our problem here really is that parTraverse is the wrong combinator to use here. The behaviour we actually want is to run each IO[Candidate.Single] in our list concurrently, and return the result of the first IO in our list to complete successfully while cancelling the remaining IO.

This is almost the behaviour offered by the aptly named IO.race combinator, the only difference is that IO.race takes only a pair of IO at a time and returns the first to finish in an Either type. Luckily, it’s easy enough to write a small helper function raceMany that just builds on top of IO.race with our desired type-signature and semantics (takes a list of IO, returns the result of first successful IO, and cancels the rest).
Finally, we can just return the result of applying our raceMany function to the listOfSingleCandidateOrNever i.e. singleCandidate and that concludes the definition of our deduceSingleCandidate function.

At this point, you might be concerned about the number of concurrent processes we would be creating using this approach. For each Cell we call deduceSingleCandidate on, we end up creating 20 concurrent processes (one for each peer). So, if a Sudoku puzzle has 50 missing Cells, we would end up creating at least 1000 (50 * 20) concurrent processes. Now, that might sound like a lot but actually, there is nothing to be concerned about here. The fundamental abstraction in Cats Effect is a Fiber, and this is ultimately what each of our concurrent processes will map back to. And unlike physical threads, which are extremely scarce, you can create literally millions of fibers as they’re quite lightweight and the IO runtime will take care of scheduling them in an efficient and fair manner. If you’re still unconvinced, here’s a short excerpt from the Cats Effect documentation:

Fibers are the fundamental abstraction in Cats Effect. The terminology is intentionally reminiscent of Threads since fibers are literally lightweight threads (often referred to as "green threads" or "coroutines"). Much like threads, they represent a sequence of actions which will ultimately be evaluated in that order by the underlying hardware. Where fibers diverge from threads is in their footprint and level of abstraction.
Fibers are very lightweight. The Cats Effect IO runtime implements fibers in roughly 150 bytes per fiber, meaning that you can literally create tens of millions of fibers within the same process without a problem, and your primary limiting factor will simply be memory.

The last piece we need to figure out is how to write the refineToSingleCandidateOrNever function which bring us to our last snippet of code:

We call .getValue on the peerCell. Recall that this is a blocking action; it will return a Value eventually, and this is absolutely fine! We want this IO to wait until the peerCell has been solved and has a Value. A peerCell that hasn’t been solved isn’t any use to us.
Once we have the peer’s Value, we can proceed to refining the Ref[IO, Candidate]. Ref offers a number of methods for modifying their contents such as .set, .update, .modify etc. Depending on the semantics you’re looking for, one or another method will make the most sense to use. In our case, we will use the .modify method. This is an extremely powerful function and a little hard to explain with just words. To make things easier, let’s look at a stripped down definition of Ref and the .modify method:

class Ref[A] { def modify[B](f: A => (A, B)): IO[B] }

As you can see from this type-signature, .modify is a higher-order function. The idea is that we give .modify a function (f) that can take in the existing value (A) of the Ref and return a tuple consisting of the next value (A) to set inside the Ref as well as another value (B) that .modify itself will return effectfully (IO[B]). .modify will attempt to run our function to completion but if at any point during the running of our function, the value (A) already inside the Ref changes (i.e. a concurrent modification occurs), .modify will re-run our function using the updated value in the Ref. Note that it may need to do this several times if there are a lot of concurrent modifications happening to the Ref. Under the hood, Ref is mostly just a functional wrapper around Java’s AtomicReference and will use a compare-and-swap (also known as a CAS) loop to make concurrent modifications safely, hence why f might be run multiple times. As a result, we should ensure our function f is pure (deterministic and free of side-effects) because .modify might run f multiple times, otherwise we risk getting unexpected results.

So, our next course of action is to actually write our f which we will do as an anonymous function consisting of a series of case statements handling each of the states our Ref could be in.
Figuring out the next value (A) of the Ref based on the peerValue is the easy part for. If the value inside the Ref is a Candidate.Multiple, we can call the refine method on it. If this returns a Candidate.Single, it means we’ve successfully found the solution to this Cell! We can therefore set the next value of the Ref to this Candidate.Single.

In addition, recall that .modify expects f to return a tuple of not just the next value (A) of the Ref but also the value (B) we want .modify itself to return. In our case, we can return the same Candidate.Single value we’ve obtained except we’ll wrap it in an IO using IO.pure. Why are we wrapping this in an IO? Why not just return the plain Candidate.Single value? The reason for this will become evident once we go through the other case statements.
Even if we aren’t able to refine the Candidate.Multiple into a Candidate.Single, we may have still managed to eliminate a candidate! For example, if the original state of our Ref is Candidate.Multiple(Set(1, 2, 3)) and we see the peerValue.value is 3, we can still refine our state to Candidate.Multiple(Set(1, 2))! Therefore, we can set the next value of the Ref to the potentially refined multiple. However, in this case, we return an IO.never as our B. IO.never describes a program that never terminates. At first glance, that probably seems like a stupid and wasteful thing to do but remember our deduceSingleCandidate function is prepared to handle this behaviour via the raceMany function i.e. it is expecting only one IO to return a Candidate.Single value (the first case statement), and the rest of the IO to simply never terminate.
Finally, in the case where the Ref is already a Candidate.Single, we simply set that as the next value (i.e. this is effectively a no-op) and return an IO.never as our B, similarly to the second case. We could have returned an IO.pure(single) here as well and it wouldn’t make any difference to the correctness of our program but this is not strictly necessary, as the only way to end up in this case is if a previous .modify of Ref already refined the Candidate.Multiple into a Candidate.Single for us, and this would have already returned an IO.pure(single) to the caller.

And that’s it for our f function. .modify will run this function, potentially several times, and it will return an IO[IO[Candidate.Single]]. You might wonder, why do we have an IO nested in an IO? Glad you asked.
So, our f returns an IO[Candidate.Single] as it’s B and .modify will return this value to us but .modify itself is an effectful function meaning it will return this value to us effectfully as an IO[B] , or in our case IO[IO[Candidate.Single]]. Most of the time when you end up with an IO nested inside another IO, you’ve likely done something wrong like used .map instead of .flatMap. In this case however, there is no .flatModify variant of .modify so we’re going to have to just call .flatten ourselves to get back an IO[Candidate.Single]. While it’s not super important, it can be helpful to understand what exactly an IO[IO[Candidate.Single]] represents for us.

First of all, recall that an IO is simply a description of a program. So IO[IO[Candidate.Single]] is a description of a program that describes how to generate a description of another program. In our case, the outer IO program describes how .modify will update the value of the Ref and return an IO[Candidate.Single] program. The IO[Candidate.Single] program itself is a description of a program that returns a Candidate.Single. More specifically, the inner IO (i.e. IO[Candidate.Single]) program, the way we’ve defined it, will return to us a Candidate.Single value if and only if we were able to refine our Ref into a Candidate.Single, otherwise it will never terminate. Therefore, if we want to run the inner program, we need to .flatten our IO[IO[Candidate.Single]] into a IO[Candidate.Single]

This pattern of having f return an IO as the B and then calling .flatten on the resulting nested IO from .modify allows us to essentially run some code if and only if we’ve successfully updated the Ref. There’s other ways to write this that are perhaps less cryptic but this is probably the most concise way, and once you understand it, it’s pretty neat.
Finally, we return the singleCandidate we’ve deduced for this Cell, if ever.

And with that, we’ve written all the code we need. If we run this Solver against an easy Sudoku puzzle, it will actually work! Below is the input and output from one of the unit tests in the repo:

Reflections

Phew, that was a lot of content to cover. Before we part ways, lets recap some of the interesting features of Cats Effect we encountered in this post:

Deferred provides a nice way of synchronizing multiple IO. We didn’t have to write any logic to poll for the state of other cells. Instead we were able to just block any dependent IO via a Deferred.get call.
Ref on the other hand is great for sharing and modifying state between multiple IO safely. We not only get referential-transparency but also don’t have to deal with tricky locks and/or blocking of threads which is often the traditional solution.
Lastly, we saw how we can use combinators like parTraverse and race to compose concurrent operations. The key word to highlight here is that we compose IO values. Just like flatMap, parTraverse and race combine IO values and give back IO values. And these IO values we get back are still just a value describing a program. In this way, we’re able to iteratively build up complex IO programs simply by combining smaller IO programs, which is pretty neat.

On the flip side though, it’s hard to deny our Solver implementation is complicated, and in particular the deduceSingleCandidate method. It’s not that we wrote a particularly large amount of code (in fact, this was just under 60 lines of code), it’s just the code is quite “dense.” While this is partly due to the concurrent nature of the program, we’ve also made our life a bit difficult by using relatively low-level primitives from the Cats Effect standard library such as Ref and Deferred. We have much more “structured” APIs available in Cats Effect for writing concurrent programs and we will use one of these to solve this same problem in the next post in a much simpler fashion. Join us in Part 3 to see how we can solve this same problem using Cats Effect Queue!

Concurrent Sudoku Solver: Part 2 - Using Cats Effect Ref + Deferred + IO.race

Introduction

Cell abstraction

Cell implementation

Reflections

Written by Farooq Qaiser