Scratching the Surface on Transactions and Concurrency Control

Amy Li
6 min readMay 21, 2019

--

After being introduced to SQL databases at my data science bootcamp, I became interested in learning more about the underlying concepts that define how databases operate, which led me to reading about transactions and concurrency control. Transactions and concurrency control schemes are database concepts that regularly appear in real-life situations. There is still a lot I have to learn but I wanted to summarize what I understand so far to help me wrap my head around these concepts.

What is concurrency control?

Concurrency control is the management of simultaneously executing transactions.

Why is this important?

Concurrency control keeps each transaction isolated as it is executed which helps data remain consistent even after the transaction ends especially in multi-user systems. Imagine you are reading data from a database at the same time another user is writing into it. The copy of data you obtained will not include these updates and thus is an inaccurate or inconsistent set of data. Analysis completed on this data will yield inaccurate information for decision-making. To prevent this, it is important to have concurrency control. To understand how this works, we first need to talk about transactions.

Database transaction: http://maxdb.sap.com/doc/7_7/81/74b30edc2142658e510080ef6917f1/ppt_img.gif

A transaction is a unit of operations, usually read and write operations, which is executed from start to finish to access and update various data items. An example is transferring money from a savings account to a checking account. A good transaction has ACID properties:

  • Atomicity: “All or none” — All operations of the transaction are completely executed and reflected properly in the database or none at all.
  • Consistency: A transaction is completed in a way that ensures that the database is in a consistent state after the execution.
  • Isolation: Each transaction, even if there may be multiple transactions happening concurrently, must be executed in isolation so that it is unaware of other transactions happening in the system.
  • Durability: The changes a fully executed transaction has on the database should persist regardless of any adverse events.
Transaction States: https://www.tutorialspoint.com/dbms/images/transaction_states.png

There are a couple of states a transaction goes through:

  • Active: the initial state when the operations in the transaction are being executed
  • Partially Committed: the state after the final operation is executed but before the changes are saved to the database
  • Failed: the state after the execution is discovered to be unable to proceed
  • Aborted: the state after the transaction has been rolled back and the database is restored to the state prior to the transaction (terminated)
  • Committed: the state after completion of the transaction and changes are reflected in the database (terminated)

If we consider what is the best method to ensure consistency and isolation of a transaction, transactions should be conducted serially one after another. One transaction should complete its cycle before a new transaction begins on the same item. However, there are disadvantages to this approach such as reduced resource utilization and overall inefficiency.

https://image.slidesharecdn.com/lecture03transactionconcurrencycontrol-121201053857-phpapp01/95/transaction-concurrency-control-24-638.jpg?cb=1354340463

Concurrency has two advantages:

  • Improved resource utilization and efficiency of the system: If transactions happen one after another, the number of transactions completed within a given amount of time would be greatly reduced, and work can pile up. As more transactions are waiting in queue, resources are allocated to keep them idling, which prevent them from being used elsewhere. With concurrency, these two problems would be prevented.
  • Reduced waiting time: Transactions can be short or long. A short transaction may need to wait an indefinite amount of time to be executed as it waits for a long transaction to be completed if they are completed serially. However, if they can happen simultaneously, both transactions can be completed in a shorter period.

So how can we combine the advantages of both serializability and concurrency? Schedules and concurrency control mechanisms help. Schedules arrange each step of the transaction in a serial pattern so the transaction can go to completion without worry of interference from other transactions. Concurrency control mechanisms manage the flow of transactions and improve the efficiency of the system.

https://image.slidesharecdn.com/transactions-and-concurrency-control-patterns-170223193756/95/transactions-and-concurrency-control-patterns-15-638.jpg?cb=1487878836

Types of Concurrency Control Mechanisms

There are many types of concurrency control mechanisms. I will only introduce lock-based protocols here but other mechanisms are multiple granularity mechanisms, timestamp-based protocols such as Thomas’ Write Rule, validation-based protocols also associated with optimistic concurrency control mechanisms, multiversion schemes, and snapshot isolation.

Concurrency Control Algorithms: https://www.researchgate.net/profile/Mehdi_Asadi2/publication/202199234/figure/fig1/AS:305900326801408@1449943729411/Classification-of-Algorithms-in-the-Concurrency-Control.png

Lock-Based Protocols

Lock-based protocols, as its name suggests, utilizes locks to manage the access and changes to an item to ensure there is consistency and that each transaction is completed in isolation while allowing multiple transactions to happen at the same time in some situations. When a transaction holds a lock on a data item, it prevents other transactions from modifying it. Some important notes about this type of mechanism:

  • Shared-mode (read-access only): Multiple transactions with this lock can access the same item at the same time
  • Exclusive-mode (read/write access): Only one transaction can have this lock at a time.
  • The workflow is: transaction request a lock in the appropriate mode depending on the operations it wants to perform on the item → request goes to concurrency control manager → concurrency control manager grants lock to the transaction.

Compatibility function:

  • If a transaction already has a shared mode lock and another transaction requests a lock to access the same item, one of two ways can happen:
  • If the lock being requested for is also a shared mode lock, the concurrency control manager will grant the lock.
  • If the lock being requested for is not a shared mode lock (thus an exclusive mode lock), the transaction will need to wait until those with the shared mode lock releases their lock.
  • Overall this concept shows that multiple transactions can concurrently read a data item but only one can write into it, only if no one else is accessing the item. This prevents those reading the items from reading inconsistent copies.
Lock Compatibility Matrix: https://image.slidesharecdn.com/transactions-and-concurrency-control-patterns-170223193756/95/transactions-and-concurrency-control-patterns-19-638.jpg?cb=1487878836

Two-Phased Lock Protocol

  • A lock-based protocol used a lot in enterprise systems, which splits into two phases: growing phase (initial phase when the data is accessed), and shrinking phase (end phase when the lock is released).
  • Scheduling of concurrent transactions is based on the lock points, the moment when a lock is released.

I am just barely scratching the surface in the world of transactions and concurrency control but already I can tell how important it is to learn about these concepts because more knowledge on these concepts will help with better understanding how best to manage the flow of data when working with databases to avoid conflict situations such as loss of data and inconsistency.

References:

--

--