Testing MySQL isolation levels

ACID stands for Atomic-Consistency-Isolation-Durability is a set of properties of database transactions intended to guarantee validity even in the event of errors, power failures, etc. (Wikipedia). This is what I learnt when sitting in university. :D

Specifically, definition of Isolation is: Isolation determines how transaction integrity is visible to other users and systems. (Wikipedia)

In database system, people defined 4 isolation levels : Read Uncommitted, Read Committed, Repeatable Read, and Serializable. There are some read phenomena such as: Dirty Read, Non-repeatable Read, Phantom Read, and Write Skew. Here is the matrix between isolation level and read phenomena:

I fascinated when the first time reading this. I asked myself, how those things really happens in real-world situation, so I tried to write some tests to evaluate those “assumptions”.

The method is: I try to stimulate 2 concurrency transactions. Specifically, I open 2 terminal tabs, each tab is opened as one database connection. On the first connection, I will create new transaction with keyword BEGIN without COMMIT or ROLLBACK to stimulate long-running query. In the mean time, on terminal 2, I will run some queries for testing.

This test is running on MariaDB (version 10.3.13). For consistency in testing, I create a SQL script, in fact, is a store procedure that creates table and sample data. Before every test, we should close all database connection and re-run again CALL generate_schema();

You can download the script at: https://gist.github.com/hqt/75f8ae64fb5b79d09aeea108e6123464

  1. Test UNCOMMITTED flag

This test will show that, first transaction can see some “dirty data” on second transaction, although it hasn’t finished yet.

First terminal: we temporary set isolation level to READ UNCOMMITTED before beginning transaction. Then we query to get sample data.

SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
BEGIN;
SELECT year FROM foo WHERE id = 1; -- return 1993

Second terminal: We create new transaction, then we update same record without commit.

BEGIN;
UPDATE foo SET year = 2019 WHERE id = 1;

First terminal: In same transaction, we try to get again same record. But at this time, we get new data from second transaction.

SELECT year FROM foo WHERE id = 1; -- return 2019
COMMIT;

In real world, this behavior is dangerous because second transaction can be failed and rollbacked, or we only see middle state of transaction. This behavior easily leads to data inconsistency.

If we retry again all above steps, but change isolation level from uncommitted to read committed, returned data in last query still 1993.

SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;

2. Test READ COMMITTED flag

This test will show that, at read committed isolation level, we have a problem: in same transaction, a row is retrieved twice and the values differ between reads.

On first terminal: As test 1, we set isolation level and select some data.

SET SESSION TRANSACTION ISOLATION LEVEL READ COMMITTED;
BEGIN;
SELECT COUNT(*) FROM foo WHERE year <= 1995; -- total 3 rows
*************************** 1. row ***************************
count(*): 3
1 row in set (0.000 sec)

On second terminal: We try to update one record to new value. This update will affect result of previous query. (I always try to wrap in BEGIN/COMMIT because autocommit flag might be turned off on testing database).

BEGIN;
UPDATE foo SET year = 2019 WHERE id = 1;
COMMIT;

Again, on first terminal, without close transaction, we run again same query and see that in same transaction, same query, data might be different.

SELECT COUNT(*) FROM foo WHERE year <= 1995; -- return 2
COMMIT;
*************************** 1. row ***************************
count(*): 3
1 row in set (0.000 sec)

This problem will be fixed if we move up isolation from committed read to repeatable read.

3. Test REPEATABLE READ flag

On this test, I want to prove that if we use isolation level to REPEATABLE READ, we will have “phantom read” issue: on same transaction, there are 2 different results for same query, when we create (not update) new record on other transaction. But it seems “strange”. Let’s see.

On first terminal, as last 2 tests:

SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
BEGIN;
SELECT COUNT(*) FROM foo WHERE year <= 1995; -- return 3

On second terminal, I try to create new record:

BEGIN;
INSERT INTO `foo` (name, year) VALUES (“foo_foo”, 1995);
COMMIT;

On first terminal, I try to run again same query and expect result will change from 3 to 4, as repeatable read cannot detect phantom read:

SELECT COUNT(*) FROM foo WHERE year <= 1995; -- return 3
*************************** 1. row ***************************
count(*): 3
1 row in set (0.000 sec)

Amazing!!! This is not we expected. So, I try one more step, try to update inserted record and hopefully it will fail:

UPDATE foo SET year = 2019 WHERE name = ‘foo_foo’;
COMMIT;
Query OK, 1 row affected (0.000 sec)
Rows matched: 1 Changed: 1 Warnings: 0

Looks like it doesn’t fail :( So I test again by final query for confirmation:

SELECT * FROM foo WHERE name = ‘foo_foo’;
*************************** 1. row ***************************
id: 7
name: foo_foo
year: 2019
1 row in set (0.000 sec)

So based on 2 tests, I have some conclusions for MariaDB/MySQL at REPEATABLE isolation level:

  • When using just select statement, phantom read doesn’t not happen as SQL standard mentioned.
  • When the transaction modifies data (write/delete/update), we can write successfully to “unseen data” .the behavior is a mix of Repeatable Read (rows not modified are not visible) and Read Committed (modified rows are visible).

I have searched and some people have same result with me. I recommend read this link (with comment), because it is more detail and really interesting. https://blog.pythian.com/understanding-mysql-isolation-levels-repeatable-read/

4. Test SERIALIZABLE flag

Finally, I try to check last isolation level: Serializable flag. I really don’t know what should I expect at this phrase ? Maybe this isolation level will fix above problem. Hopefully : )

As usual on first terminal:

SET SESSION TRANSACTION ISOLATION LEVEL SERIALIZABLE;
BEGIN;
SELECT COUNT(*) FROM foo WHERE year <= 1995; -- return 3 as usual

On second terminal: I try to create new record.

BEGIN;
INSERT INTO `foo` (name, year) VALUES (“foo_foo”, 1996); — waiting
COMMIT;

OMG. At this step, miracle thing happened. We see that second transaction is waiting. Or, in other words, second transaction is locked and waiting for first transaction finish.

So, on first terminal, we COMMIT; and boom, second transaction on second terminal will automatically continue its work and done.

This is huge step from repeatable read to serializable. : )

Conclusion

This will be the end of my blog. Experience in running is always better than only read and remember on book. : ) Moreover, important lesson I learnt from this testing is: there are huge differences between theory on paper and real implementations, as on “repeatable read” isolation case. So to make sure on everything before applying on production environment, we should always at least take a try and running some samples / tests on real data.

There are more things that I want to do, and I will add into this blog topic in future. Some of them are:

  • test those isolation levels on different database vendor such as Postgres, “real” MySQL,… different storage engine such as MyIASM (ahhh, I forgot, MyIASM has no transaction concept :P )
  • I also want to test how MariaDB/MySQL relax on different range lock conditions. Are they really good ?
  • Write an automated python script to illustrate all those steps rather than do it manually.

References:

Here is some useful references.

MySQL document, teaching all about isolation levels.

MySQL document, teaching all about types of locking. This document mentions about gap locking — the locking mechanism that handles in case phantom read.

Chapter 7: Transaction, Design Data Intensive Application: Discussing detail all types of isolation level with many illustrations. This is really a must-read document.

https://www.amazon.com/Designing-Data-Intensive-Applications-Reliable-Maintainable/dp/1449373321

Another nice reference.

A nice article explained how write skew happened in Serializable Snapshot Isolation.