Designing Data Intensive Applications: Transactions with Weak Isolation

Adrian Booth
6 min readMay 8, 2020

--

One of the finest books I’ve had the pleasure of reading recently is Designing Data Intensive Applications by Martin Kleppmann. This book has become renowned for its comprehensive review of distributed systems and its interrogation of the various trade-offs involved in design choices. It took me 3 months to fully complete, and a lot of time to soak up what was taught. In the following series of posts I’d like to summarise some of the key insights I’ve learnt from each chapter, starting with my favourite of them all; database transactions.

As we know, many things can go wrong with databases.

  • Database software or hardware can fail at any time (including in the middle of a write)
  • Applications may crash at any time, during write operations
  • Interruptions in the network may occur
  • Several clients writing to the database at once, overwriting each others changes
  • Clients may read data that doesn’t make sense
  • Race conditions between clients causing bugs

Transactions are a way for databases to shield applications from these types of concerns; by grouping several read/write operations into a single logical unit. Within an API call that writes to 4 tables, the database vendor will guarantee an “all-or-nothing” approach to avoid data inconsistencies. The application developer only has to be concerned with two outcomes: 1) Success 2) Failure.

Without this, we’d be left with a whole string of permutations to manage in application code like “What if 3 database writes succeed but 1 fails? What if 2 of the writes succeeds and 2 fails?” etc.

One key topic throughout this chapter is about Isolation. This is the “I” in ACID (Atomic Consistent Isolated Durable). All this means is that for an ACID compliant database the client application can be sure that operations being performed on the data will not be interfered with by other clients and other requests. A typical race condition would be like the one below. Two users trying to increment the same counter, and due to the timing of the requests it looks like only one increment occurred, and the data doesn’t reflect the “true” expected state that it would be if these requests had occurred non-concurrently.

Transactions have different isolation levels and varying degrees of performance implications that developers should be aware of when implementing them. Let’s dive into some of the ones Kleppmann discusses.

Throughout this chapter, 3 levels of isolation were reviewed in more detail.

  • Read committed isolation
  • Snapshot isolation (also known as repeatable read)
  • Serializable isolation

The first two of these are known as “weak isolation levels”. Kleppmann warns readers that even though popular database systems will advertise their “ACID compliant” transactions, the levels of isolation actually implemented are not always the strongest and can still lead to race conditions. We’ll see why now.

Read committed

This is the most basic level of transaction isolation that makes two guarantees

  1. When reading from the database, you will only see data that has been committed (no dirty reads)
  2. When writing to the database, you will only overwrite data that has been committed (no dirty writes)

The main reasons to prevent dirty reads (uncommitted reads) is in the event the transaction is rolled back, the client that read the data isn’t left in a state of confusion. To read a constantly incrementing counter at 5 in one request (where the 5 hasn’t yet been committed), and then at 4 in subsequent request could cause all sorts of subtle bugs, and lead the client to make incorrect decisions based on invalid data.

Below is an example of read committed isolation in practice. Without this isolation level, User 2 on the second request would see x = 3, so before it had been committed.

An example of the dangers caused by dirty writes can be seen below. Imagine a scenario where two users are trying to buy the same car. Two write operations must occur when the purchase is executed; the first to update the listings and the second to update the invoices. Without read committed isolation we could end up with users overwriting data that has yet to be committed. So in our car purchase case, the system would record the car being sold to Bob whereas the invoice for the purchase was sent to Alice

By default, the minimum level of isolation you’d get from a database is read committed and it’s a very popular isolation level in databases from Postgres to Oracle.

Snapshot Isolation and Repeatable Reads

It’s easy to get complacent with the “Read Committed” isolation and think that’s all there is to it. Unfortunately, race conditions can sneak up on applications even when the database prevents concurrent writes from stepping on each others toes.

Below is an example that Kleppmann demonstrates can occur with read committed isolation; an anomaly called read skew

Our user, Alice, has $1000 split across two accounts with $500 each. She transfers $100 from one to another, yet she’s unfortunate enough to read from both accounts at the same time as the transaction is being processed, meaning she believes she only has $900 across both of them.

This is an example of whats called a non-repeatable read and can still occur with read committed isolation. It’s normally considered acceptable in most cases because the problem doesn’t persist. As the user reloads the page they will likely see the correct balance again, and many applications can tolerate temporary inconsistency like this. But not all.

The most common solution to this problem is snapshot isolation. The idea behind this is that each transaction will view the data that was committed at a given point in time. So whilst you’re inside of a transaction, you will only see data that was committed before the transaction started. Any subsequent modifications will not be reflected inside the procedure. This is typically used for long running analytics based queries that must somehow “freeze” the dataset in order to build an accurate aggregation of the existing data, without new requests interfering with the process. Kleppmann writes “It’s very hard to reason about the meaning of a query if the data on which it operates is changing at the same time the query is executing”.

In this post i’ve walked you through two of the most common forms of weak isolation that we can use in database systems; read committed isolation and snapshot isolation. In summary, read committed ensures we get no dirty reads/writes (so that clients will only read and write data that has been committed, not data that is currently being modified within a transaction)

We also discussed some of the problems with read committed isolation, and how it can still lead to read skew, as was the case with the users bank transfer example. To solve this we use snapshot isolation to force the database to only view data that had been committed before the transaction started. The chart a few paragraphs above shows how this kind of read skew can occur.

In the next post i’ll walk you through some of the problems with these forms of isolation, and why stricter levels might be necessary depending on your use case.

--

--