Transaction and Locks
The moment we heard about the transaction, we start thinking about the database. However, the concept of a transaction is not limited only to DB but also to non-DB implementations like OS operations.
Consider this hypothetical example where two friends are trying to buy an item from the same amazon account at the same point of time whose wallet already has 100$ and checkout item cost 20$ each. What do you think should be the updated value of the wallet amount now? should it be 80$, 60$, error, or LOCKED??
If the final value of the wallet after the transaction is 60$ then everything is fine. However, if it is anything besides 60$ then our DB is highly unreliable. This is all happening because of “Transaction”.
How “Transaction” works?
In the above mentioned hypothetical situation, despite both the requests from A and B landed at the same time on DB, DB picks anyone request let’s say ‘A’ here, put a lock on the designated row,20$ gets deducted from the wallet and row value is updated to 80$. Until ‘A’ finishes the job,’ B’ is not allowed to touch/update data from the database. Once ‘A’ finishes the job, ‘B’ also updates the row and the ultimate balance is 60$.
What is “Transaction”?
A transaction is a single unit of logic or work, sometimes comprises of multiple operations and it should either happen in full or nothing at all.
Let’s understand this with yet another example. Alice and Bob are two best friends. Alice has 20$ in her account and Bob has 0$. Alice wants to send bob 10$.
While transferring the money, the above steps will get executed one by one i.e first 10$ will get withdrawn from Alice's account and get deposited into bob’s account. This is an ideal scenario.
Now, let’s consider a scenario where Alice wants to transfer 10$ to bob, withdraw() operation gets executed successfully, however, something fishy happens at the deposit(). Alice now has 10$ left in her account and Bob hasn’t received any money. To prevent this situation above two steps should happen under a single transaction. we treat the transaction as a FULL or none i.e now if the deposit operation fails, the withdrawal operation gets reverted and Alice ends up having 20$ in her account. If everything succeeds the transaction gets committed in the database.
Consider a concurrent transaction scenario i.e Alice is now trying to send 10$ to bob as well as to Caleb at the same point in time. If these steps are not protected by transaction, bob and Caleb will receive 10$ each and Alice still would have 10$ in her account which is a really buggy thing. If these operations are protected by transaction then, Even though Alice has trigged both the response at some point in time, only one will get picked and executed. Post-transaction Alice will have 0$ in her account.
Adavantage of transaction:
- Reliable unit of work. we have the correct recovery procedure and keep data consistent.
- Isolation from concurrent access.
There are various ways to implement this transaction method, of which the most common one is the write-ahead log.
Let’s say we have a transaction. As shown in the above image, when we commit this transaction to DB, they end up in WAL(in memory). Before making any changes in DB, a lock is being acquired on that particular row which basically ensures no other transaction should update the row. There’s various way to acquire lock which includes read uncommitted, read committed, etc.
So if the isolation level is set to read uncommitted, it let read to happen from the locked row i.e other transactions can still read the row but can’t modify the row. Once the transaction is successful, all the information is being committed to DB.
- Pessimistic Lock:
The most common type we use, where we basically acquire the lock on the particular row, do some work(read, write, update) on it, and then release the lock. (As we discussed in the above example).
2. Optimistic Lock:
Optimistic locking is a strategy where we are confident about the future. In this strategy, we read a record, take note of a version number or timestamps or checksums/hashes and check that the version hasn’t changed before you write the record back.
Consider a case where Alice is trying to read the first row and update it. Bob also happens to read and modify the same record. Let’s say that at time t1 Alice trigger’s the request to read the first row, before Alice modifies the result, bob also picked row 1 and modifies the value of the amount to 22, at this point version will also get set to 2. Now Alice finishes its processing and tries to update row 1, which will get failed as the version has now been changed to version 2.unlike pessimistic lock, we are not locking the data from the time of reading to the time we update it back.
Use case: Wikipedia article editing, Editing an article takes a lot of time say 5 mins(small changes) to 1 hour or even more. if we lock an article then no one would be able to edit this article until person x finishes all his work and publishes the changes.
RDBMS uses pessimistic lock.
Performance in the distribution system:
Pessimistic lock tends to hold up resources so that no other resource can update that particular document/resource.
Optimistic allows other actors to modify the resource and is preferred in a distributed environment.
The selection of locks depends highly on our use-case.
If there are very few conflicts in record updation then go for an optimistic lock. Let’s consider a hypothetical situation where at a time let's say 1000 people are trying to update the same article then optimistic lock won’t give better performance. If there are more conflicts, the pessimistic lock works well. The pessimistic lock won't work in the case of Wikipedia, we can’t lock the article for the time until editing is done. Also, we need to ensure there is no deadlock in the pessimistic system. Also, we need to make sure there is always a timeout.
I hope this article gives you a brief about Transaction and locks. In the upcoming articles, I will be writing about distributed transactions and distributed locks.
Till then stay tune!!!