Conflicts and Concurrency in Distributed Systems — Part 2
In the previous article, we talked about handling conflicts and concurrency in a “transaction-less” world. In this article, we will look at those two from the perspective of transactions.
What is a transaction?
A transaction is a group of reads and writes that executes as one unit. Either the entire transaction succeeds (commit) or it fails (abort). You cannot afford a financial transaction to be 50% complete. Right? It should either be 100% or 0%.
Isolation
Transactions are also not spared from concurrency problems. Isolation is a trait according to which two transactions cannot interfere with each other. Only one transaction can wear the crown at one point. The database ensures that the outcome of concurrent transactions is the same as if they had run serially (one after another). But how does the database guarantee that? Well, in theory, the database provides some guarantees that are categorised as Weak Isolation Guarantees (Read Committed and Snapshot Isolation), and Strong Isolation Guarantees (Serializability).
Read Committed
It makes two guarantees:
No dirty reads — a user will only see data that has been committed.
- Suppose, a transaction Txn1 wants to change the value of key x from 1 (the last committed value) to 2.
- There is a universal stopwatch which is shared by all the transactions.
- The update starts at 10ms (the start time of the transaction is 10ms) and finishes by 20ms. Then, the transaction starts executing some other updates (a transaction is a group of reads and writes). By 40ms, Txn1 commits.
- Now, in another part of the world, there is another transaction Txn 2, that wants to read the value of x. The read starts at 20ms (the start time of the transaction is 20ms), and by 30ms it finishes.
- If Txn2 reads the value of x as 2 (remember, the update for x by Txn 1 had finished by 20ms but not yet committed), then this read is called a Dirty Read. However, as per Read Committed Isolation guarantee, Txn2 reads the value of x as 1 since, for Txn2, the last committed value of x is 1.
- Interestingly, if Txn2 performs another read of x at 40ms, then according to the guarantee, it reads x as 2. Because, by this time Txn1 had already committed.
No dirty writes — a user will only overwrite data that has been committed.
- Suppose, there is another transaction Txn3, that also wants to modify the value of x.
- Txn3 begins at 20ms, same as Txn2.
- If Txn3 is successfully able to update the value of x (say by 30ms), then this write is called a Dirty Write (remember Txn1 had not yet committed). But, if Read Committed Isolation is guaranteed, then Txn3 will wait till the watch hits 40ms mark (Txn1 will have committed by 40ms). After which it can update.
In summary, according to read-committed isolation, writes can block other writes. Reads can continue though (using the last committed values). Let’s look at the implementation now.
Implementation of Read Committed Isolation:
- Databases prevent Dirty Writes by locking a particular row/record. If a transaction wants to modify an object then it must first acquire a lock on that object. The transaction then should hold the lock till it commits. After releasing the lock, another transaction can acquire the lock on the same object. As a rule, only one transaction can hold the lock on any given object.
- In order to prevent Dirty Reads, the same lock can be acquired by the transaction that wants to read. But this can lead to bottlenecks. Imagine, 100 read-only transactions waiting for a long-running write transaction to commit. Ideally, writers cannot block readers. For this reason, this approach is not preferred. In a happy world, the database remembers the old and new committed values. If the write transaction has not yet committed, the database simply gives the last committed value to all the ongoing reads. If the write transaction has committed, then all the subsequent reads are given the new value.
Snapshot Isolation
Consider the following scenario:
- Suppose, a company has two accounts, say A and B, each having USD 500.
- Txn1 begins at 10ms. It makes a request to read the balance in account A. The read finishes by 20ms. The database returns the last committed value (as per Read Committed Isolation) USD 500 to this read.
- Txn2 begins at 20ms. It makes a request to transfer USD 100 from account A to B. The updates finish by 40ms and by 50ms the transaction commits. The new committed values are USD 400 (account A) and USD 600 (account B).
- Txn1, then makes another read request subsequently at 50ms (remember Txn2 had committed by now) to read the balance in account B. The database gives the last committed value from step 3 which is USD 600 (again as per Read Committed Isolation) to this read.
Txn1 is now thinking that the company has made a quick profit of USD 100. Don’t you think you need an isolation that is stronger than Read Committed for the scenario that we just saw? Snapshot Isolation is something that could save the day here. Snapshot Isolation makes sure that readers never block writers and writers never block readers, and a transaction always sees a consistent snapshot of the database throughout its execution. The database maintains multiple committed versions of an object for all the in-progress transactions that might have started at different points in time. Garbage collection happens in the background to delete the versions that are no longer required by any of the current transactions. This is called MVCC (Multi-Version Concurrency Control).
In the scenario mentioned above, according to Snapshot Isolation, Txn1 will always see the following version until it commits or aborts.
[
A: 500,
B: 500,
]
Any writes made by Txn2, which started after Txn1, are ignored even if Txn2 had committed before Txn1 as shown in the example above. Similarly, all the writes made by the transactions that might have started before Txn1, will also be ignored, even if the transactions subsequently commit.
A new transaction Txn3 that begins at 50ms will see a different snapshot or version as mentioned below.
[
A: 400,
B: 600,
]
Race Conditions
Snapshot Isolation sometimes leads to race conditions which might result in lost updates, phantoms and write skews. In this article, we will only cover lost updates otherwise it will be just too much. In the next article, we will cover write skews, phantoms and serializability.
Consider the following scenario:
- Suppose there are two transactions Txn1 and Txn2 that are trying to concurrently increment the same counter i.
- Txn1 begins at 10ms, reads the current value of i (suppose, according to Snapshot Isolation the value is 9), increments it (9+1=10), and writes it back (a typical read-modify-write cycle). Txn1 commits by 50ms. The committed value of i is now 10.
- Txn2 begins at 20ms, reads the current value of i (also 9, thanks to Snapshot Isolation), increments it (9+1=10), and performs the update at 50ms. Please note that Txn1 had already committed by now (the lock acquired by Txn1 is released by now). Txn2, then commits by 80ms. The committed value of i is still 10.
In the example above, did you notice that the update made by Txn1 vanished into thin air? How do we prevent this from happening? Well, Snapshot Isolation comes with an interesting flavour that automatically detects lost updates. In the example above, Txn2 could have been aborted and retried because of the detection of a lost update (made by Txn1). Lost update detection happens automatically and is thus less error prone.
There are other solutions as well to the above anomaly.
Atomic write operations: Replace read-modify-write cycles with atomic update operations. For example,
UPDATE counters SET value = value + 1 WHERE key = 'i';
Explicit Locking: For example,
BEGIN TRANSACTION;
SELECT * FROM counters WHERE key = 'i' FOR UPDATE;
UPDATE counters SET value = value + 1 WHERE key = 'i';
COMMIT;
The FOR UPDATE clause locks all the rows returned by SELECT query.
Compare-and-set: Allow the update only if the value hasn’t changed since you last read it. For example,
UPDATE counters SET value = value + 1 WHERE key = 'i' AND value = 9;
Hope you found this article useful. As I mentioned above, in the next article, we will talk about write skews, phantoms and serializability.