Designing Data Intensive Applications: Strong Isolation using Serializability

Published in

The Startup

6 min readMay 15, 2020

In the last two articles on transaction isolation levels (here and here) we’ve discussed the varying degrees that a database isolates the data that you’re operating on. We saw that read committed isolation protects our application against dirty reads and writes; how snapshot isolation protects us against non-repeatable reads and read skew. We saw how phantoms can infiltrate our transactions and lead us to execute write requests when they shouldn’t be allowed.

None of these problems would exist if users would just behave and wait for someone else to finish their request before submitting theirs. Of course we can’t expect our users to be omniscient, which is why we have to protect our data by isolating it to varying degrees based on our use case.

We saw that having a non-repeatable read (in the case of the user transferring money between two bank accounts), might not be a huge deal for certain applications. There’s little point over-engineering a solution when users are more forgiving for transient inconsistencies.

For certain types of applications however, particularly ones in the finance realm, we need very strong levels of isolation. This brings us to serializable isolation.

This is regarded as the strongest form of isolation, and guarantees that even though transactions may execute in parallel, the end result will be the same as if they had executed one after another, serially, without any concurrency.

There are three methods to achieve seriablizable isolation which i’ll describe below:

Actual Serial Execution

If you want to remove race conditions entirely, then just stop allowing concurrency altogether. This is the simple idea behind actual serial execution, in that we literally execute one transaction at a time, in serial order, on a single thread.

Immediately you might be thinking about the negative performance implications involved in this kind of method. For decades its been known that if you want to increase performance then multi-threaded concurrency is a good place to start. However, there are situations where the performance hit from executing transactions serially might not be as bad as you’d expect.

Firstly, OLTP (Online Transaction Processing) queries are typically short, fast and only return a small number of rows. If the transaction is fast enough to execute, then spinning up an additional number of threads will not make a huge difference.

Secondly, if the write throughput is low then we likely won’t encounter bottlenecks as a result of only having a single thread. In the opposite scenario where we’re managing an application with a very high amount of write throughput, a single thread might well become a point of stress. A solution to this would be to partition the data to scale it to multiple nodes and CPU cores. With this architecture you could scale your transaction throughput linearly with the number of CPU cores.

This isn’t ideal for applications that would need a transaction to access data across multiple partitions. There’s additional coordination overhead involved that greatly reduces the overall write throughput which has severe implications when it comes to scaling. For this reason, actual serial execution isn’t the ideal solution in all cases.

Two Phase Locking

Two Phase Locking (2PL) is one of the oldest and most widely used algorithms for achieving serializability.

Before we continue, i’d like to expand on the topic of database locks. There are two types of locks I’d like to discuss that a transaction can obtain; shared lock and exclusive lock. They’re also known as read lock and write lock respectively.

An exclusive lock is one that gives exclusive access to a single transaction for writing to the specific part of the database. When this lock is place, no other transaction can modify or read the data being performed on.

A shared lock is one that can be shared by multiple transactions and prohibits another transaction from acquiring an exclusive lock whilst the transactions are in the process of reading the data.

For analogy, imagine a classroom containing many students with a teacher standing in front writing notes on a blackboard. In this situation, the blackboard is the set of data that can be locked by various transactions. The teacher is the writer and the students in the classroom are the readers.

The teacher obtains an exclusive lock on the blackboard and starts writing on it. During this time, no student can read what she’s writing because she has blocked the blackboard as she’s writing to it.

Once she’s done, the students themselves can start reading and therefore they’re obtaining shared locks of the notes that have just been written. Whilst the students are reading and comprehending what’s on the board, the teacher cannot obtain a new exclusive lock until they are done reading.

Once they’re finished (the reading operations are over), the teacher can take back the exclusive lock by blocking the board and erasing what’s on there

This is exactly how two phase locking works in practice within database transactions. After a transaction has acquired a lock, it must continue holding onto it until the commit or abort phase of the transaction. This is where the name two phase locking comes from; the first being the obtaining of the lock (either shared or exclusive) and the second phase being the eventual releasing of the lock once the transaction is complete. During the second phase (when the students begin to stop reading), other transactions can start to obtain exclusive locks (teacher can start writing to the blackboard again) once every shared lock has been released.

Whilst two phase locking yields fantastic isolation benefits, it comes with a significant performance cost. The overhead of obtaining and releasing locks (on top of reduced concurrency) can be devastating for response times, and it’s the reason why 2PL hasn’t been adopted quite so heavily for applications that are performance sensitive.

Unstable latencies are also a problem with 2PL if there’s too much contention (many transactions attempting to acquire locks on the same dataset). All it takes is for one slow transaction to cause a large queue of others waiting to operate on the data. Imagine if just one student in the classroom is a very slow reader, and takes twice as long as the other students to read and process the teachers notes; the teacher will have to wait much longer to take an exclusive lock on the blackboard again and get writing.

Serializable Snapshot Isolation

So far, all of this has been rather depressing. We seem to be caught in an unenviable position of having to sacrifice performance for data integrity. Whilst solutions like Two Phase Locking will yield tremendous isolation benefits, it’s clear from the above example that you pay for that isolation with performance.

On the other hand we have isolation levels like read committed and snapshot isolation that allow our application to meet the performance expectations of our users, but risk race conditions like read skew and phantoms (see here) which sacrifice the integrity of the data within our system.

However, it’s not all doom and gloom. There is a relatively new isolation level (first described in 2008) that provides full serializability at a tiny performance cost. This is known as serializable snapshot isolation.

This level of isolation is based on the premise of optimistic locking.

To contrast the difference between optimistic and pessimistic locking, let’s revisit Two Phase Locking (2PL).

2PL takes a pessimistic approach to locking, in that it makes the assumption that bad things will happen (users will try to overwrite each others data in simultaneous transactions). As a result, it takes a full lock on the dataset (via an exclusive lock) and prevents any reads and writes to that data until the transaction is complete.

The actual serial execution method discussed above is an extreme form of pessimistic locking. Rather than taking locks on a narrow dataset (like 2PL), it essentially holds a lock on the entire database because it’s been executed in a single thread; no transactions can execute until it’s complete. The “lock” only needs to be held for a very short time however, as we aim to make each transaction very fast to execute. For systems with longer running transactions, actual serial execution would not be a sensible choice if high performance was required.

Optimistic locking, however, means that instead of blocking other transactions, we just go ahead anyway and hope for the best assuming that nothing bad happens (another transaction interfering with ours). When our transaction wants to commit, it checks whether isolation has been violated (another transaction interfered with our data) and if it has it aborts.

Serializable snapshot isolation and optimistic locking tend not to perform very well when there’s a high level of contention — that is, many transactions attempting to mutate the same data at the same time- but there are many types of applications that don’t have this problem; therefore they can take full advantage of an isolation level that ensures data integrity without sacrificing the performance of the application.

Designing Data Intensive Applications: Strong Isolation using Serializability

Written by Adrian Booth