Snapshot Isolation in Scala

Kirill Polunin
Nov 4 · 6 min read

In this article, I'll try to describe the optimistic concurrency algorithm called Isolation Snapshot and implement it in Scala.

Disclaimer

I believe Scala is not the best instrument to build such algorithms because it requires low-level memory management and uses Scala not-idiomatic constructs to be a little bit more efficient. Most likely C/C++ will suit better for this purpose, but anyway, this article just illustrates Proof of Concept of Isolation Snapshot.

Why do we need it?

In general, you have several ways to control concurrency over the shared state, for instance in databases. Consider a simple example:

Two processes (abstract processes, not OS processes) are trying to increment a simple memory cell. Both of them are running simultaneously.

P1 : reads 0 by X key

P2: reads 0 by X key

P1: writes 1

P2: writes 1

We have a "lost update" problem here. So some kind of atomicity is needed at this point. How could we accomplish that? As I said before, there are several ways to solve this. First is "pessimistic locking" when you lock all the cells you needed before starting your operation, nobody can read or write over your keys until the operation ends. Obviously, there is no parallelism in this way. Second is "optimistic concurrency" which is not using locks! Honestly, that's just half of the truth but it sounds hopeful!

Main concept

Here is the point. Let's introduce the idea of a transaction. This transaction will work with its own "copy" of the entire memory storage. Initial memory will be immutable until a point. This transaction will only change its own copy of the memory. When the transaction updates a record by key, it will create a copy, then update this copy. When the transaction ends, it will try to update the initial memory.
This phase is called a"merge". In fact, the transaction creates its own version of a record. This approach is very similar to the GIT version control system.

Firstly copy-update, then merge. Not as complicated so far, right? But there are pitfalls. Let’s introduce a new concept of conflict. The conflict occurs when two or more processes are accessing one memory cell, and at least one of these accesses is a write operation. So what do we do when two processes are trying to increment X? We have a conflict.

How do we resolve and merge it? We will abort the second transaction and restart it. Let’s see.

The naive way

The “merge phase” in this particular algorithm needs to have a way of linearization. We need to figure out which Tx is first and which one is second. In other words, some kind of ordering between transactions is needed Tx1 => Tx2.

One particular way to do it on a real machine is to create a critical section (reminder: only one process or CPU thread in this particular case can run in this section). Let's do some code.

I implemented some simple domains called Cats and Dogs for this demonstration. Cats have legs, dogs have tails (don't ask me why ;) ). All the code with 2 branches is available on https://github.com/scarymrgrey/Isolation.Snapshot.poc

Our storage is presented as a simple case class with two props. Also, nodes are needed which will encapsulate our records with all the metadata.

To track different versions of a record I am using the "next" property. Property "branchFrom" is self-descriptive, it will track the initial node on the copy-update phase. All we need to do here is to detect a conflict, and run merge code in critical section and check if somebody has already updated the "node.next" property. If they did, we have to abort our transaction and restart it from the beginning. That will be done in the commit() method. Storage.synchronized { … } creates a critical section for us. And that's it!

But one could see a problem in this code. On the one hand, we didn't use locks, on the other hand, we have a critical section which kills our parallelism. So maybe there is a better way?

A little bit more comprehensive solution

The main point of the improved algorithm is to use micro-grained locks on the merge phase. So that's why I said it was just half of the truth: it doesn't use locks at all. But these locks are drastically more efficient because they are taken on a short period of time and on a pre-selected node. This approach is called 2PC (2 phase commit).
The first phase is called "prepare" and we are taking locks, on the second phase we are doing the "commit". Actually, these are not "locks" in the familiar sense, they look more like "marks". Using this approach, we will try to lock a certain cell.
If the lock was not taken by another transaction, we will merge this version, if not, then we will abort the Tx. So what if we update more than one record? On the "prepare" phase we will take all the locks that we need and only when all the locks are acquired we can execute the commit (or merge) phase.

Time for tests!

Let's implement a simple insertion and update test.

I'm using a Futures monad with the execution pool to run transactions in parallel. In fact, it just inserts one new cat and one new dog. Finally, the collection of cats and dogs should be equal.

It works! Let's have a closer look on how the tryLock() method works.

The test finishes in 6.124 sec on my 2018 MacBook Pro 13. Can improve this time?

Instead of java.util.concurrent.locks.ReentrantReadWriteLock I used critical section.

Twice better. StampedLocks showed 4 sec. So does it mean to use "synchronized {}" instead of regular RW Locks? Of course no, but in this particular case, it did the trick.

So what we have finally, perfect algorithm? Actually no, because instead of locking we are constantly restarting one of the conflict transactions. In addition, there is a drawback called "write skew" anomaly. Example:

Eventually, we will get X=1, Y=1. This means there is no Serializable execution, in other words, we cannot say that Tx1 has executed before Tx2 or visa versa. This problem solved by "Serializable Snapshot" mechanism implemented by many modern databases. It uses a conflict-graph detection algorithm and this is a story for another article.

Conclusion

In Snapshot isolation level transactions use its own copy storage and change data in its buffer until commit. It locks certain keys while merge phase, but these locks are short-living and cheap. In real database Timestamps are used for versioning while reading instead of direct memory references (node.next) in RAM. But this article reflects the core ideas of Isolation Snapshot.
The field of such optimistic concurrent flow is wide. Even CPU uses similar transactions on its cache lines to optimize several algorithms!

Thanks for reading.

Snapshot Isolation in Scala

Snapshot isolation Proof of Concept

Kirill Polunin

Written by

Snapshot Isolation in Scala

Snapshot isolation Proof of Concept

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade