An Overview of Databases — Part 5: Concurrency Control

Saeed Vayghani
5 min readJul 24, 2024

--

Part 1: DBMS Flow
Part 2: Non-Relational DB vs Relational

Part 3: CAP and BASE Theorem

Part 4: How to choose a Database?
Part 5: Different Solutions for Different Problems
Part 6: Concurrency Control
Part 7: Distributed DBMS
>> Part 7.1: Distributed DBMS (Apache Spark, Parquet + Pyspark + Node.js)
Part 8: Clocks
>> Part 8.1: Clocks (Causal Consistency With MongoDB)
>>
Part 8.2: Clocks (MongoDB Replica and Causal Consistency)
Part 9: DB Design Mastery
Part 10: Vector DB
Part 11: An interesting case, coming soon!

Concurrency

Concurrency Control refers to the management of concurrent operations on a database without causing inconsistencies or conflicts. It is a critical aspect of Database Management Systems (DBMS), ensuring that multiple transactions can occur simultaneously while maintaining data integrity and consistency.

Importance

  • Data Integrity: Prevents anomalies and ensures that database operations do not compromise the correctness of the data.
  • Performance: Enhances the performance and throughput of a system by allowing multiple transactions to execute concurrently.
  • User Experience: Provides a seamless user experience by allowing multiple users to interact with the system simultaneously.
  • System Efficiency: Utilizes system resources effectively, reducing the time transactions spend waiting and increasing system efficiency.

Motivation

  1. Lost Update Problem (Concurrency): How can we avoid race conditions when updating records at the same time?
  2. Durability Problem (Recovery): How can we ensure the correct state in case of a disk/power failure?

How to meet above motivations?

ACID (The criteria used to ensure the correctness of a database)

  1. Atomicity: It ensures that either all actions in the transaction happen, or none happen.
  2. Consistency: Database is guaranteed to be consistent when the transaction completes.
  3. Isolation: Data being used during the processing of one transaction cannot be used by another transaction until the first one is completed.
  4. Durability: The transaction’s results will not be lost in a failure.

Lets take a look into more details of all four aspect of ACID:

Atomicity and Logging: DBMS logs all actions so that it can undo the actions in case of an aborted transaction. It maintains undo records both in memory and on disk.
Logging is used by almost all modern systems for audit and efficiency reasons.

Atomicity and Shadow Paging: The DBMS makes copies of pages modified by the transactions and transactions make changes to those copies. Only when the transaction commits the page becomes visible.

Database Consistency: The database accurately represents the entities it is modeling and follows.

Transaction Consistency: If the database is consistent before the transaction starts, it will also be consistent after.

Note: Concurrency Control is more important when it is about the transaction isolation:

Isolation and Pessimistic Concurrency Control: The DBMS assumes that transactions will conflict, so it doesn’t let problems arise in the first place.

Isolation and Optimistic Concurrency Control: The DBMS assumes that conflicts between transactions are rare, so it chooses to deal with conflicts when they happen after the transactions commit.

The order in which the DBMS executes operations is called an execution schedule. The goal of a concurrency control protocol is to generate an execution schedule that is is equivalent to some serial execution.

There are some Schedule protocols which are out of the scope of this lecture, you can read more about them by your own. ( I also don’t know much about them)

Durability: All of the changes of committed transactions must be durable after a crash or restart. The DBMS can either use logging or shadow paging to ensure that all changes are durable.

Transaction Locks and MVCC

There are two basic types of locks:

1. Shared Lock (S-LOCK): A shared lock allows multiple transactions to read the same object at the same time. If one transaction holds a shared lock, then another transaction can also acquire that same shared lock.

2. Exclusive Lock (X-LOCK): An exclusive lock allows a transaction to modify an object. This lock prevents other transactions from taking any other lock (S-LOCK or X-LOCK) on the object.

The lock manager updates its internal lock-table with information about which transactions hold which locks and which transactions are waiting to acquire locks.

The DBMS’s lock-table does not need to be durable since any transaction that is active when the DBMS crashes is automatically aborted.

Multi-Version Concurrency Control three rules:

  1. Writers do not block readers.
  2. Readers do not block writers.
  3. Easily support time-travel queries.

How MVCC Works

  • Versioning: Each data item has multiple versions, each labeled with a unique timestamp or transaction ID. MVCC creates duplicate copies of records so that data can be safely read and updated at the same time.
  • Visibility: A transaction sees a consistent snapshot based on the versions that were committed before the transaction started. When a transaction reads an object, it reads the newest version that existed when the transaction started.
  • Garbage Collection: Older versions of data that are no longer needed are periodically cleaned up to free storage space.

Drawbacks to MVCC:

  • Concurrent update control methods are difficult to implement.
  • The database grows in size and becomes bloated by multiple versions of DBMS records. PostgreSQL uses VACCUM process, to identify and delete duplicate and unneeded records created by the MVCC process.
MVCC (Img from researchgate.net)

Application of MVCC

  1. Transactions:
    MVCC provides transaction isolation by creating a snapshot of the database at the start of each transaction. This allows transactions to read consistent data without being affected by other concurrent transactions.

MVCC enables high concurrency by allowing readers to access the database without waiting for writers to complete their operations. Writers can also proceed without waiting for readers.

2. Data Consistency:
MVCC ensures that readers see a consistent view of the database, reflecting only the committed changes that were present at the start of their transaction. Uncommitted changes made by other transactions are invisible to them.

3. Version Management:
When a row is updated, PostgreSQL creates a new version of the row (tuple) instead of overwriting the old one. Each version has metadata, including transaction IDs, to track its visibility.
Old versions of tuples are eventually removed by the VACUUM process, freeing up space and maintaining database performance.

4. Read and Write Operations:
MVCC allows multiple read operations to occur without blocking, as each transaction can read from its snapshot of the database.

When a write operation occurs, it creates a new version of the data. Other transactions that started before the write operation will not see the new version until the transaction is committed.

5. Locks:
MVCC minimizes the need for locking, especially read locks. While write locks are still necessary to prevent conflicts, readers can proceed without waiting for locks to be released.

Note: Please consider that this post does not cover every single detail and aspect of concurrency control protocols. It just covers the most important parts of this topic to help you understand the idea behind it. That will be your effort to learn more by reading reference books.

--

--

Saeed Vayghani

Software engineer and application architecture. Interested in free and open source software.