Achieving Peak Data Consistency: Understanding Serializability in Database Transactions

Published in

The Developer’s Diary

7 min readMay 19, 2024

In this series, we have explored various weak isolation levels and common concurrency issues. While these isolation levels provide a balance between performance and data consistency, there are still issues that cannot be completely resolved without a higher level of isolation. This brings us to serializability, the highest level of isolation in database systems.

What is Serializability?

Serializability is the isolation level that ensures the outcome of executing transactions concurrently is the same as if the transactions were executed sequentially, one after the other. This guarantees the highest level of data consistency and eliminates concurrency anomalies such as lost updates, write skew, and phantom reads.

Achieving Serializability

There are multiple ways to achieve serializability in database systems, including actual serial execution, two-phase locking, and optimistic concurrency control techniques like serializable snapshot isolation.

Actual Serial Execution

What is Actual Serial Execution?

Actual serial execution involves running transactions one at a time in a strict sequence, ensuring that no two transactions overlap. This method guarantees that the outcome of executing transactions concurrently is the same as if the transactions were executed sequentially, thereby achieving serializability.

How is it Done?

One way to implement actual serial execution is by encapsulating transactions in stored procedures and executing them one by one. This can be managed by a scheduling system that ensures each transaction completes before the next one begins. Here’s a simplified example of how it can be implemented:

Encapsulate Transactions in Stored Procedures: Each transaction is defined as a stored procedure in the database.

CREATE PROCEDURE TransferMoney (IN from_account INT, IN to_account INT, IN amount DECIMAL)
BEGIN
    START TRANSACTION;
    UPDATE accounts SET balance = balance - amount WHERE account_id = from_account;
    UPDATE accounts SET balance = balance + amount WHERE account_id = to_account;
    COMMIT;
END;

2. Sequential Execution: A scheduling mechanism ensures that these stored procedures are executed one at a time.

CALL TransferMoney(1, 2, 100.00);
CALL TransferMoney(2, 3, 50.00);

Pros and Cons

Pros:

Simplicity: The implementation is straightforward as transactions are executed one after another, eliminating the need for complex concurrency control mechanisms.
Guaranteed Consistency: Since transactions do not overlap, there is no risk of concurrency issues such as lost updates, write skew, or phantom reads.

Cons:

Performance Bottleneck: Running transactions sequentially can lead to significant performance issues, especially in high-transaction environments, as each transaction must wait for the previous one to complete.
Low Throughput: The system’s overall throughput is limited by the speed at which individual transactions are processed, making this approach impractical for most real-world applications.

Actual serial execution ensures the highest level of data consistency but at the cost of performance and scalability. For many applications, this trade-off is unacceptable, leading to the use of more sophisticated techniques like two-phase locking and optimistic concurrency control to achieve serializability without sacrificing performance.
Use Cases for Actual Serial Execution

Actual serial execution can be beneficial in specific contexts:

Small Transactions: Suitable for environments where transactions are small and quick, minimizing performance impact.
In-Memory Databases: Effective when the entire dataset fits in memory, reducing I/O overhead and speeding up transaction processing.
Low Write Throughput: Ideal for systems with low write throughput, where transactions are infrequent, and performance bottlenecks are less of a concern.
Critical Consistency: Useful for applications requiring strict data consistency and integrity, such as high-value financial transactions or critical data updates.

Two-Phase Locking (2PL)

Two-Phase Locking (2PL) is a common concurrency control method used to achieve serializability in database transactions. It involves two distinct phases for each transaction: the growing phase and the shrinking phase.

Growing Phase: In this phase, a transaction can acquire locks but cannot release any locks.
Shrinking Phase: Once the transaction releases its first lock, it enters the shrinking phase, during which it can release locks but cannot acquire any new ones.

This protocol ensures that once a transaction starts releasing locks, no other transaction can acquire the same locks until the current transaction is completed, thus preventing conflicts and ensuring serializability.

Example of Two-Phase Locking

Transaction A:

Begins and acquires a lock on Resource 1.
Acquires a lock on Resource 2.
Performs its operations.
Releases the lock on Resource 1.
Releases the lock on Resource 2.

Transaction A cannot acquire any more locks once it starts releasing them.

Transaction B:

Begins and attempts to acquire a lock on Resource 1.

Transaction B must wait until Transaction A releases the lock on Resource 1.

2. Acquires a lock on Resource 1 once Transaction A releases it.

3. Attempts to acquire a lock on Resource 2.

Transaction B must wait until Transaction A releases the lock on Resource 2.

4. Acquires a lock on Resource 2 once Transaction A releases it.

5. Performs its operations.

6. Releases the locks on Resource 1 and Resource 2.

Performance Considerations of Two-Phase Locking

Lock Overhead: Acquiring and releasing locks involves computational overhead. The more locks a transaction requires, the higher the overhead, potentially reducing overall system performance.
Wait Time for Locks: Transactions must wait for locks held by other transactions to be released. If a transaction holds a lock for an extended period, it can delay other transactions, leading to increased wait times and reduced throughput.
Deadlocks: Two-phase locking can lead to deadlocks, where two or more transactions are waiting indefinitely for locks held by each other. Deadlock detection and resolution mechanisms are necessary to handle this issue.

Predicate Locks and Index Range Locks

To address specific types of conflicts, two-phase locking can utilize predicate locks and index range locks:

Predicate Locks: These locks are based on logical conditions rather than specific data items. For example, a predicate lock might be placed on all rows where age > 30. This ensures that any transaction modifying rows matching this condition must wait for the lock to be released.
Index Range Locks: These locks are used to lock a range of index values, ensuring that any transaction attempting to modify data within this range must wait for the lock to be released. This is particularly useful for preventing phantom reads, where new rows matching a query’s condition are inserted by another transaction.

Serializable Snapshot Isolation (SSI)

So far, we have explored two-phase locking, which may not perform well, and actual serial execution, which does not scale efficiently. We have also examined weaker isolation levels prone to various errors. In this article, we will delve into Serializable Snapshot Isolation (SSI), a technique that combines the benefits of snapshot isolation with serializability, providing a robust solution for maintaining data consistency in concurrent transaction environments.

What is Serializable Snapshot Isolation?

Serializable Snapshot Isolation (SSI) is a concurrency control mechanism that ensures transactions execute as if they were run sequentially, one after the other, while still allowing for high concurrency and performance. SSI builds on the concept of snapshot isolation, which allows transactions to operate on a consistent snapshot of the database taken at the start of the transaction, and adds mechanisms to detect and resolve conflicts that could lead to anomalies.

Pessimistic vs. Optimistic Concurrency Control

Concurrency control can be broadly categorized into pessimistic and optimistic approaches:

Pessimistic Concurrency Control:

Two-Phase Locking (2PL): Two-phase locking is a pessimistic concurrency control method based on the assumption that conflicts are likely to occur. It involves acquiring locks on data before reading or writing, ensuring that transactions wait if there is a risk of conflict.
Actual Serial Execution: This is an extreme form of pessimistic control where transactions are executed one at a time, eliminating any possibility of conflict but resulting in poor performance and scalability.

Optimistic Concurrency Control:

Serializable Snapshot Isolation (SSI): In contrast to pessimistic methods, SSI is an optimistic concurrency control technique. It operates under the assumption that conflicts are rare and allows transactions to execute without immediate locking. Instead, SSI detects conflicts during transaction execution and resolves them, typically by aborting and retrying the conflicting transactions.

Decision Based on Outdated Premise

A key challenge in Serializable Snapshot Isolation (SSI) is ensuring that decisions are not based on outdated premises. This involves determining whether the results of a query might have changed due to concurrent transactions. SSI handles this by detecting stale reads and writes that affect prior reads.

Detecting Reads of Stale MVCC: The database uses Multi-Version Concurrency Control (MVCC) to keep track of multiple versions of data items. When a transaction reads data, it sees a snapshot of the database as it existed at the start of the transaction. To ensure the data is not stale, the database checks if any newer versions of the data have been created by other transactions since the snapshot was taken. If a transaction tries to commit changes based on outdated data, the database can abort and retry the transaction to ensure consistency.

Detecting Writes that Affect Prior Reads : To detect writes that might affect prior reads, the database monitors dependencies between transactions. If a transaction writes to a data item that was read by an earlier transaction, the system identifies this as a potential conflict. The database then ensures that these conflicting transactions are serialized correctly, typically by aborting and retrying the later transaction. This mechanism prevents anomalies such as write skew and ensures that all transactions reflect a consistent state of the database.

Performance of Serializable Snapshot Isolation

Serializable Snapshot Isolation (SSI) offers significant performance advantages by allowing high concurrency without the need for extensive locking. By leveraging Multi-Version Concurrency Control (MVCC), SSI enables transactions to read from consistent snapshots, reducing lock contention and improving transaction throughput. However, the conflict detection and resolution mechanisms can introduce overhead, especially in write-intensive workloads where transactions frequently update the same data. Despite this, SSI often provides a good balance between strong consistency and high performance, making it suitable for many real-world applications.

Conclusion

This concludes our series on database transactions, spanning from weak to strong isolation levels. We’ve explored various isolation levels, including Read Committed, Snapshot Isolation, Repeatable Read, and the highest level of isolation, Serializable. Each level provides different trade-offs between performance and data consistency. Serializable Snapshot Isolation (SSI) stands out for its ability to combine high concurrency with strong consistency guarantees. By understanding and applying these concepts, you can design robust and efficient database systems tailored to your application’s needs. Thank you for following along, and we hope this series has provided valuable insights into managing concurrency in database transactions.

Ref: This article was inspired by the book “Designing Data-Intensive Applications” by Martin Kleppmann, which provides in-depth insights into the complexities of database design and transaction management.