This is a well-known fact for database experts, but I’ve noticed people are using the word Repeatable Read for describing exactly what it supposed to mean: Allow reading the same data multiple times. For example, if you are using snapshots for updating databases, write operations never modify the previous snapshots, so you can safely read the snapshots multiple times to obtain the same data set.
Unfortunately, the definition of Repeatable Read in ANSI SQL-92 standard doesn’t mean that repeatability. ANSI SQL-92 defines four transaction isolation levels: Read Uncommitted, Read Committed, Repeatable Read, and Serializable, and each of them is defined by using three types of possible anomaly states, called phenomena in the specification: P1 (Dirty Read), P2 (Non-repeatable or Fuzzy Read), and P3 (Phantom). The following table shows the correspondence between ANSI Isolation Levels and these anomaly states:
According to this definition, Repeatable Read might have P3 (Phantom) problem. OK. Let’s look at the definition of P3:
3) P3 ("Phantom"): SQL-transaction T1 reads the set of rows that satisfy some <search condition>. SQL-transaction T2 then executes SQL-statements that generate one or more rows that satisfy the <search condition> used by SQL-transaction T1. If SQL-transaction T1 then repeats the initial read with the same <search condition>, it obtains a different collection of rows.
What? Repeatable Read will see a different collection of rows? ANSI Repeatable Read allows P3 (Phantom), but the definition of P3 explicitly states transaction T1 might not read the same data set in the second search. It’s clearly not repeatable!
This is actually an unfortunate choice of the word in SQL-92. I guess the initial assumption at that time (1990s) was that DBMSs keep holding read locks during a transaction so reading the same data should be possible for almost all cases, unless the other transaction inserts or deletes the records that match the condition used in the search. To protect the search result, the DBMS must take a predicate lock for the search condition and prohibit any insertions and deletions of records that will match the predicate. In general, managing predicate locks is expensive because we need to check the conflicts of all existing predicates. If we define this as repeatable read, the transaction protocol becomes completely serializable, so there is no practical reason to have Repeatable Read as a separate isolation level.
The ANSI SQL-92 specification written in plain English has its own problems. A paper, A Critique of ANSI SQL Isolation Levels (SIGMOD 1995), redefined these anomalies in SQL-92 more precisely, and added missing anomaly states in SQL-92, such as Dirty Write, Read Skew, Write Skew, etc. The following table shows the redefined isolation levels:
A Critique of ANSI SQL Isolation Levels
A Critique of ANSI SQL Isolation Levels - Berenson et al. 1995 udpate: 2 minor corrections in the section on A5A per…
A confusing part is Repeatable Read and Snapshot Isolation are incomparable because each of them has its own anomaly; Repeatable Read cannot protect Phantom, but Snapshot Isolation cannot protect Write Skew and Phantom in some cases.
Write Skew can happen if two transactions T1 and T2 read different data, say X, Y, and write to Y, X respectively. If T1 and T2 read X and Y at the same time (by using snapshots) and write Y and X, we cannot define any serial order between T1 and T2. It also can be written like this:
- A5B (Write Skew): r1[X]…r2[Y]…w1[Y]…w2[X]…(c1 and c2 occur)
In this case, T1 writes Y (creates Y1), but to have a serialization order T1 -> T2, r2[Y] should have read Y1. So write skew causes non-serializable transactions.
A good news is Snapshot Isolation can be implemented in a serializable manner (TODS 2005), and its follow-up work, Serializable Snapshot Isolation (SSI) protocol took the best paper award in SIGMOD 2008. One of its implementations is already available in PostgreSQL (VLDB2012). So even if people say snapshot isolation, sometimes it means non-serializable one with phantom and write skew problems, but sometimes it’s truly serializable.
- Never use the word “repeatable read” because ANSI SQL-92’s definition contains self-contradiction. If you are using snapshot isolation, it’s better to say that you can read previous snapshots, instead of saying that the read is repeatable.