A-Z of Database Transactions (Part-1)

Sandeep Verma
7 min readFeb 24, 2019

--

Why A.C.I.D (Atomicity, Consistency, Isolation and Durabililty) safety guarantees by different databases should be taken with grain of salt

Life of data is much larger than the software application used for its creation. Keeping static data safe is one challenge, maintaining its integrity during a system failure, concurrent modifications take it to another level. A datastore should handle fault gracefully and must be reliable. Innumerable things can go wrong for a data system like hardware crash in the middle of write event, database software failure, same data overwritten by different clients at the same time, network partition during data replication or partial data update. Transactions have been the actual solution to these issues since last three decades.

Transaction management done by different databases is less understood possibly because they have taken away headache the of handling a variety of error scenarios, concurrency issues away from application programming model. Even after that application developer should have a sound understanding of safety guarantees that transactions provide for the following reasons :

1. Not all vendors provide same safety guarantees (ACID safety guarantee varies from vendor to vendor, hence they should not be accepted blindly)

2. An application is supposed to take critical decisions, whether to retry or discard changes when database abort a transaction either due to some transient error (for which retry make sense) or when the error is due to some integrity violation (for which undoing changes is a must)

3. Even with ACID safety guarantees, it’s the responsibility of the application to maintain consistency of data. A database can prevent from a foreign key constraint violation or unique constraint violation but it won’t prevent application to write defective data. The application defines what is valid or invalid

I will be covering ACID (Atomicity, Consistency, Isolation, and durability) safety guarantees in length and will go in depth how it’s done at the database level. Down the line, I will discuss various issues with concurrency, and how different databases handle them. Let’s first unravel A.C.I.D guarantees

What is A.C.I.D?

Nowadays it has become a fashion to label a database as “ACID Compliant”, however, what guarantees comes up with this is unclear and ambiguous. Since, there are no standards, being ‘ACID Compliant’ could mean a particular database vendor guaranteeing isolated behavior during highly concurrent environment, thus preventing any dirty reads/writes or it could also mean just preventing unwanted/dirty reads but allowing dirty writes. That is why it’s so important to have a good understanding of what is there under the hood. Let’s deep dive into definitions of atomicity, consistency, isolation, and durability, and improve our understanding of transactions.

Atomicity

Atomic in simple terms means the smallest unit that is indivisible and irreducible. Atomicity in the world of transactions means all operations that are defined under a transaction to form a group and they either proceed together or in case some of them fail to proceed, then they abort together. In other words, either each entity that forms group succeeds or each of them fails, thus coming back to a state from where they started.

ACID atomicity defines that if a client request for several writes under a transaction then either all of them gets processed or if any of it is failed (Because of a network interruption, the disk is full or process crash), then none gets committed and the transaction is aborted. Database undoes all writes that it has done so far in that transaction.

For example, let’s say a user wants to add ₹ 10000 to his fixed deposit account. Now, this amount should be deducted from his main account (Let’s say it’s A-1 and with the current balance ₹ 20000) and move to his fixed deposit account (say ‘F-1’ with current balance ₹ 50000).

Figure 1. Snapshot of table before fund movement
Figure 2. Expected snapshot of the table after fund movement

This fund movement between A-1 and F-1 account must occur atomically, else there will be inconsistency in states which is unacceptable and can result in financial loss. If debit from A-1 succeeds but credit to F-1 fails then the transaction should undo all partial writes, which in turn means to abort the transaction

Figure 3. Snapshot of the table if the transaction is not atomic

Atomicity with this definition sounds more like revertability or abortibility and can be better pictured with abortibility than atomicity

Abort

Consistency

ACID Consistency guarantee by a database implies that the database remains in a good shape all the time. The database is expected to maintain data invariants. In the above example, if the debit is successful but credit fails and the transaction doesn’t abort, then database invariants becomes invalid and writes during transaction fails to preserve the data validity.

Though Consistency in ACID is perceived to be provided by the database as a safety guarantee, a closer look reveals it to be more closely associated with the application and should be called a property of the application. It’s the application’s responsibility to define when data is inconsistent state and when it’s not. A database will never stop the application from writing invalid data and not preserving data invariants. It can at-most check unique constraints or foreign key constraints and prevent its violation, however, at the end of day database only stores data.

In practice, Atomicity, Isolation, and Durability are properties of database and Consistency belongs to the application. An application may rely on the database’s atomicity and isolation properties in order to achieve consistency, but it’s not the sole responsibility of the database. Hence, letter C as Consistency loosely relates to C in ACID

Isolation

Transactions are not isolated from each other. Often they are executed concurrently. Several users will be reading or writing at the same times to different or the same records in a database. If transactions are not isolated properly, then one transaction can step into the work of another one and make things indeterminate and inconsistent.

Transaction isolation is apprehended as something which guarantees serial execution even in a concurrent environment. If a database provides Isolation guarantee as serializable then each transaction can assume it to be the only running transaction in the database.

In practice, rarely any database vendor provides serializable safety guarantee as it degrades the performance of the database. Oracle 11g, though publicizes isolation level as serializable, but in reality, its something called as snapshot isolation which is weaker safety guarantee as compared serialization.

I have shared my views on snapshot isolation and what guarantees are provided by different databases vendors in Part-2. This is why it is necessary to understand what happens under the hood and what isolation guarantees are provided by database vendors. Blindly choosing any vendor or, not handling them properly in the application model can result in data inconsistencies and bugs which are difficult to solve especially during highly concurrent environment.

Let’s see why it’s important to have a strong isolation guarantee. In figure 4, ‘client 1’ modifies the balance of user_id=123 to ₹ 30000 from ₹ 20000 by crediting ₹ 10000.

Another client which reads balance just before ‘client 1’ updates it and finds balance as ₹ 20000. Now if ‘client 2’ updates user balance by crediting ₹ 10000 then he will be overwriting the update done by ‘client 1’. This result in a lost update (discussed in Part-3)

This is one of the possible issues with concurrent modification. In Part-2 and Part-3 this series, we will see what are other issues with concurrent read/writes and how different databased handle them.

Figure 4. Lost Update, client 2 overwrite values committed by client 1

Durablity

ACID durability guarantee promises that once a transaction has been committed the data then it will never be lost even if there are hardware crashes, database software failure or database fault. RDMS databases provide this guaranteed by using write-ahead log or something similar and restore B-Tree data structure using these logs when the server restarts. In a replicated system, durability is ensured by writing it to several replica nodes before reporting the successfull transaction.

In practice, perfect durability doesn’t exist as hard disks having write-ahead log can get corrupted or backups can get destroyed.

Summary

  • Atomicity: This guarantees either whole transaction proceeds as a unit or it everything gets reverted. No partial writes occur
  • Consistency: Database is always in a good state
  • Isolation: Isolation guarantee ensure no two transaction steps in the work of each other and in a concurrent environment, multiple transactions are executed as if they are in the serial order of execution
  • Durability: This guarantee ensures once data is written to a database it’s never lost

In the Second Part (Part-2)of this series, we will discuss issues, pertaining to concurrent reads and in Part-3 we will share write anomalies in a concurrent environment

Don’t forget to clap if you enjoyed reading this article!

--

--