Transaction Management with Spring and MySQL

Alex
8 min readMar 11, 2020

--

A transaction is a unit of program execution that accesses and possibly updates various data items. It contains multiple steps that must appear to a user as a single, indivisible unit which either executes in its entirety or not at all. We are going to discuss different aspects of transaction management in the MySQL InnoDB environment and how Spring implements those aspects using proxies.

Table of contents

  • ACID
  • Serializability
  • Isolation levels
  • How MySQL implements each level of isolation
  • Examples with Spring
  • References

ACID

Database system maintains the following properties of the transactions:

  1. Atomicity: Either all the operations of the transaction are reflected properly in the database or none are. MySQL ensures atomicity using undo logs that store information about the updates that are made by each transaction.
  2. Consistency: If the database is consistent before the execution of the transaction, the database remains consistent after the execution of the transaction. There are two types of consistency the first is ensured by the database system, for example, the data integrity constraints while the other is the responsibility of the programmer who codes the transaction to ensure the application-dependent consistency constraints.

3. Durability: To understand durability we first must understand the state diagram of a transaction.

In the above diagram, we notice the “Partially Committed” state, the transaction enters this state after the execution of the last statement. In fact, all those updates are made on memory so any system failure (e.g. power outage) can result in a loss of changes corresponding to this transaction. Durability means that after a transaction commits, the updates should be saved despite any possible system failure. and for that reason, information about the updates should be saved on stable storage (MySQL uses redo logs) and after that, the transaction moves from “Partially Committed” state to the “Committed” state so that database system can reconstruct updates whenever they are needed.

4. Isolation: The basic idea behind isolation that each transaction should be unaware of other transactions running at the same time. Let T1, T2 be two transactions running at the same time

SCHEDULE (1)

SCHEDULE (1) This table is called Schedule which represents the chronological order in which instructions are executed in the system. Clearly a schedule for a set of transactions must consist of all instructions of those transactions. The database system can generate different schedules by forcing the transactions to wait at any single point of processing.

You can notice that T2 read a value that is never committed and this will leave the database inconsistent.

The best solution for this problem is to run transactions serially (one after the other) we will talk about this solution in the next section of “Serializability” but this solution has negative performance effects where transactions should wait for each other so that, many solutions have been developed to compromise between performance and consistency and we will talk about those solutions in the “Isolation levels” section.

Serializability

As we see in SCHEDULE (1) above, transactions run at the same time may leave the database inconsistent but there is a type of schedule that is called Serial Schedule that doesn’t affect the database consistency:

Serial Schedule consists of a sequence of instructions from various transactions, where the instructions belonging to one single transaction appear together in that schedule.

Serial Schedule

For sure the database system nowadays can’t generate such a schedule to execute concurrently transactions. Imagine that T1, T2 started at the same time and the database system chose T2 then T1 order, in this case, T1 will wait for T2 to be committed which may remain active for hours!

A more efficient type of schedule that also preserves the consistency and isolation properties is the Serializable Schedule.

To understand serializable schedules we should first discuss the situations where two concurrent transactions T1, T2 conflict:

Please note that the examples used below are part of a schedule so it isn’t a complete one and each example may contains multiple additional instructions and each transaction may rolls back or commits

1- Read / Read situation:

T2 reads record A after T1 but in fact, order here does not matter since the same value of A is read by T1 and T2, regardless of the order. so there is no conflict here and we can swap between the two operations and obtains a new schedule that is (conflict equivalence) to the original one.

2- Read / Write situation:

If T1 reads the value of A before it is written by T2 (the situation above) we will get a different value from the situation where T1 reads the value of A after it is written by T2. so there is a conflict here and we can’t swap between the two operations. The two schedules aren’t (conflict equivalence).

3- Write / Read situation:

Same (Read / Write) situation and there is a conflict here and we can’t swap between the two operations. The two schedules aren’t (conflict equivalence).

4- Write / Write situation (A.K.A Dirty Write):

The value of A will be different if it is written finally by T1 or T2 and if the schedule above contains additional READ(A) operation it will be affected by the order so there is a conflict here and we can’t swap between the two WRITE / WRITE operations. The two schedules aren’t (conflict equivalence).

We say that two transactions T1 and T2 conflict if they contain two operations (one for each transaction) on the same data item, and at least one of these instructions is a write operation.

Thus, only in the case of (read/read) does the relative order of their execution not matter and we can swap between the two instructions. By doing such a swaps database system generates (conflict equivalence) schedules to the original one.

Now we can define the Serializable Schedule: as a schedule that is conflict equivalence to a Serial Schedule. and this is the type of schedule that is generated when you set the isolation level to Serializable.

Isolation levels

As we discussed before serializability has its own performance issues where any case of conflict will cause one of the transactions to wait until the other to be committed. so many solutions have been developed to compromise between consistency and performance and they are called isolation levels:

Note that dirty write in all the following solutions will cause the conflicting transaction to wait until the other to be committed.

1- Serializable: which we have discussed before, where the database system generates a serializable schedule to execute the concurrent transactions.

2- Repeatable Read: with this solution, a transaction eliminates the (read/write) and (write/read) conflicts by ignoring any database update made by another concurrent transaction. So the only conflict that forces a transaction to wait is the dirty write.

3- Read Committed: with this solution, a transaction eliminates the (read/write) and (write/read) conflicts by reading only the updates that are made by a committed concurrent transaction and ignoring updates that are made by uncommitted transactions. So the only conflict that forces a transaction to wait is the dirty write.

4- Read Uncommitted: the lowest isolation level, a transaction eliminates the (read/write) and (write/read) conflicts by reading the updates that are made by a committed/uncommitted concurrent transaction. So the only conflict that forces a transaction to wait is the dirty write.

You can notice how the last three solutions affect the consistency especially the Read Uncommitted solution where the transaction can read uncommitted data and depend on it to do other operations. It is the responsibility of the application developers to ensure consistency in case they choose one of these three solutions.

How MySQL implements each level of isolation

To understand how MySQL implements each level of isolation we should discuss some definitions:

  • Shared Lock: Permits the transaction that holds the lock to read a row. Multiple active transactions may have a shared lock on the same row.
  • Exclusive Lock: Permits the transaction that holds the lock to update or delete a row. A transaction can exclusively lock a row only if it isn’t (shared or exclusive) locked by another transaction.
  • Snapshot: A representation of data at a particular time, which remains the same even as changes are committed by other transactions. Used by certain isolation levels to allow consistent reads.
  • Consistent Nonlocking Reads: A consistent read means that MySQL InnoDB uses multi-versioning to present to a query a snapshot of the database at a point in time (it depends on the database and the information stored in the undo log to build that snapshot). The query sees the changes made by transactions that committed before that point of time, and no changes made by later or uncommitted transactions. The exception to this rule is that the query sees the changes made by earlier statements within the same transaction.
  • Locking Reads: Select records with a shared lock or an exclusive lock.
  • Dirty Reads: An operation that retrieves unreliable data, data that was updated by another transaction but not yet committed.

1- Serializable: MySQL uses Locking Reads with this type of isolation to ensure that the generated schedule is a serializable one. So if transaction T1 reads a record A another one T2 can’t update A until T1 commits.

2- Repeatable Read: MySQL uses Consistent Nonlocking Reads with this type of isolation, where the transaction obtains only one snapshot with the first consistent read operation that didn’t change until the transaction commits. So it ignores any database update made by other transactions.

3- Read Committed: MySQL uses Consistent Nonlocking Reads with this type of isolation, but the difference from Repeatable Read level is that each consistent read within a transaction sets and reads its own fresh snapshot. So it reads the updates made by committed concurrent transactions.

4- Read Uncommitted: MySQL uses Dirty Reads with this type of isolation depending on the database and changes on the memory log buffer.

Examples with Spring

We will depend on the following entity:

1- Serializable Isolation Level:

2- Repeatable Read:

3- Read Committed:

Thanks for reading

References

--

--