Snapshot Isolation: Enhancing Consistency in Concurrent TransactionsSnapshot Isolation

Priya Patidar
The Developer’s Diary
6 min readMay 19, 2024

Why Read Committed Might Not Be Useful in Certain Scenarios

In our previous article, we discussed Read Committed Isolation and how it prevents dirty reads and dirty writes to maintain data consistency. While Read Committed is effective in many scenarios, it has limitations that can lead to issues like non-repeatable reads and phantom reads. To address these shortcomings, databases can employ a stronger isolation level known as Snapshot Isolation.

Why Read Committed Might Not Be Useful in Certain Scenarios

Read Committed Isolation ensures that transactions only see committed data, preventing dirty reads and dirty writes. However, it does not fully protect against all types of concurrency issues. Here are some scenarios where Read Committed might fall short:

  1. Non-Repeatable Reads: If a transaction reads the same data multiple times, it might see different values each time if another transaction modifies and commits changes to the data in between the reads. This can lead to inconsistencies and unreliable results.
  • Example: In a financial application, suppose a transaction reads a customer’s account balance twice during its execution. If another transaction updates the balance and commits in between the two reads, the first transaction will see two different balances, causing potential inconsistencies.

2. Phantom Reads: When a transaction re-executes a query that retrieves a set of rows that meet certain criteria, it might see a different set of rows if other transactions insert or delete rows that match the criteria in the meantime.

  • Example: In an e-commerce application, a transaction might query for all orders placed by a customer. If another transaction adds a new order for the same customer and commits, re-running the query will return a different set of orders, leading to phantom reads.

3. Backup and Analytical Queries: Long-running queries, such as backups and analytical queries, need a consistent view of the database. Read Committed Isolation might not be sufficient as ongoing transactions could alter data, leading to inconsistent snapshots and unreliable analytical results.

  • Example: Consider a data warehouse performing a long-running analytical query to generate a report. If the underlying transactional database is continuously updated, the analytical query might see inconsistent data, resulting in inaccurate reports.

To handle these issues more effectively, a more robust isolation level is needed. Snapshot Isolation offers a solution by providing a consistent view of the database at the start of a transaction, ensuring that all reads during the transaction see a stable snapshot of the data.

Introducing Snapshot Isolation

Snapshot Isolation (SI) is an isolation level that provides a stronger guarantee of consistency compared to Read Committed. Under Snapshot Isolation, each transaction operates on a consistent snapshot of the database as it existed at the start of the transaction. This approach ensures that transactions do not interfere with each other, preventing issues like non-repeatable reads and phantom reads.

How is Snapshot Isolation Implemented?

Snapshot Isolation is implemented using a technique called Multi-Version Concurrency Control (MVCC). Like Read Committed Isolation, Snapshot Isolation uses write locks to ensure that only one transaction can modify a data item at a time. However, reads do not require any locks. The key principle of Snapshot Isolation is that readers do not block writers, and writers do not block readers. Here’s how it works:

  1. Multi-Version Concurrency Control (MVCC): To implement Snapshot Isolation, the database keeps multiple committed versions of each object. This allows different transactions to see the database as it was at different points in time. Each transaction operates on a snapshot of the database taken at the start of the transaction.
  2. Consistent Snapshots: When a transaction begins, it gets a consistent snapshot of the database. Any changes made by other transactions that commit after the snapshot was taken are not visible to this transaction.
  3. Version Tracking: The database maintains a version history for each object. Each version is associated with a specific transaction that created it. When a transaction reads an object, it reads the version that was committed at the time the transaction’s snapshot was taken.
  4. Write Locks: When a transaction modifies an object, it creates a new version of the object and places a write lock on it. Other transactions can continue to read the old version until the modifying transaction commits. This prevents write-write conflicts without blocking reads.
  5. Garbage Collection: When rows are deleted or updated, old versions of the data are eventually no longer needed. The database uses garbage collection to remove these obsolete versions, freeing up space and ensuring efficient storage. This process typically runs in the background, identifying and cleaning up data versions that are no longer visible to any active transaction.

Visibility Rules for Observing Consistent Snapshots

When transactions read from the database, transaction IDs are used to define what can be seen by whom. The following rules ensure that each transaction observes a consistent snapshot of the database:

  1. In-Progress Transactions: At the start of a transaction, the database makes a list of in-progress transactions (those not yet committed). Any writes made by these transactions are ignored, even if they commit subsequently.
  2. Aborted Transactions: Any writes made by transactions that have been aborted are ignored.
  3. Later Transactions: Any writes made by transactions with later transaction IDs, whether they have committed or not, are ignored.
  4. Visible Writes: All other writes made by transactions that committed before the current transaction started are visible.

These rules ensure that each transaction sees a stable and consistent view of the database, reflecting the state of the database as it was at the start of the transaction.
Indexed Access and Snapshot Isolation

Snapshot Isolation works seamlessly with indexed access, ensuring efficient and consistent query performance. Here’s how it integrates:

  1. Index Consistency: Indexes maintain multiple versions of entries, similar to data rows, ensuring that each transaction accesses the correct version corresponding to its snapshot. This prevents inconsistencies when querying indexed columns.
  2. Efficient Lookups: When a transaction reads data through an index, it retrieves the version of the index entry that was committed before the transaction started. This allows for efficient lookups while maintaining a consistent view of the data.
  3. Isolation from Concurrent Writes: Since Snapshot Isolation ensures that transactions only see committed data, index lookups are isolated from concurrent writes. This prevents issues where an ongoing transaction might alter indexed data, leading to potential inconsistencies in the results.
  4. Performance Benefits: By avoiding locks on reads, Snapshot Isolation with indexed access provides high performance and concurrency. Transactions can read data without waiting for other transactions to complete, significantly improving throughput and reducing latency.

Overall, Snapshot Isolation ensures that transactions can efficiently access data through indexes while maintaining a consistent and isolated view of the database, enhancing both performance and data integrity.

Conclusion

Snapshot Isolation provides a robust solution to the limitations of weaker isolation levels like Read Committed by ensuring a consistent view of the database for each transaction. By using Multi-Version Concurrency Control (MVCC), it allows transactions to operate on stable snapshots, effectively preventing issues such as non-repeatable reads and phantom reads. Snapshot Isolation ensures that readers do not block writers and writers do not block readers, maintaining high concurrency and performance.

Snapshot Isolation effectively supports repeatable reads, meaning that a transaction will always see the same data when it re-reads previously read rows, regardless of concurrent transactions. This isolation level, while not as strict as Serializable, strikes a balance between data consistency and system performance, making it suitable for many applications.

In our next article, we will explore common concurrency issues, focusing on how to prevent problems such as lost updates, write skew, and phantom reads. Stay tuned to learn how to safeguard your database against these common concurrency challenges.

Ref: This article was inspired by the book “Designing Data-Intensive Applications” by Martin Kleppmann, which provides in-depth insights into the complexities of database design and transaction management.

--

--