How to never ever deal with read-after-write inconsistency in replicated databases again

Published in

Cloud Pak for Data

10 min readSep 1, 2022

Fully transactionally consistent, current query results on a replicated database as if the query was executed on the source with IBM Db2 for z/OS Data Gate

By Eirini Kalogeiton, Felix Beier, Sowmya Kameswaran & Vassil Dimov

Introduction — Asynchronous replication

Running a database at scale as the main system of record is not an easy task. The more the workload increases, the more we need to think of ways to reduce the utilization of the database or at least to make spikes more predictable. One approach that many big enterprises are using is the data replication pattern, i.e., creating read-only replicas of their main transactional database which are used to execute a large fraction of the read-only queries and, thereby, relieving the source database from this workload. By that, all updates to the data are initially persisted in the main transactional database and subsequently propagated to the read-only replicas.

The replicated databases should be maintained in a way that does not significantly impact the source database. The best way to do that is to utilize an asynchronous replication based on a Change Data Capture (CDC) technology. The asynchronous replication comes with the great benefit that the additional utilization of the source database caused by the data replication process is minimal. Even more important, transaction execution time on the source database is not impacted in any way because replication is not part of the transaction execution process. On the other hand, the asynchronous data replication implies an inevitable latency between the source database and the database replica (or target database) that, depending on the workload and system configuration, might range from a few milliseconds to a few seconds or even minutes. A consequence of this latency is that queries running on the source and the target database may deliver different results. This does not satisfy the requirements of applications that require access to the most recent source data. The following section will present such a use case.

Example use case — A mobile banking application

The banking industry is highly dependent on its databases as they represent its main transactional systems. The transactional database should be as fast and performant as possible while the related cost should be kept low. In order to satisfy these two concurrent goals, predictable workloads are needed. Therefore, a typical customer will concentrate on the core transactional workload on the main transactional database and tries to shift as many read requests as possible (both analytical and transactional queries) to another, secondary platform. The asynchronous replication pattern described above suits this scenario best.

Let us take a look at how such a scenario will be designed for a mobile banking app of a bank:

First, the app provides the opportunity to make transfers and some payments — this is part of the core banking business and should involve transaction execution on the source database.
Second, a bank customer can query the newest transactions, the current account balance or some monthly and yearly balance reports which involve analytics — those are read-only requests that can be executed on the database replica.

What happens if a bank customer transfers money to a friend and immediately refreshes the transaction feed of the mobile banking app to check the current balance? Then we will have a timeline outlined in the following figure:

The bank customer transfers money through the mobile banking app. The transaction execution starts on the source database.
Upon commit, the source database transaction is persisted on the source database and the bank customer receives a transaction confirmation in the mobile banking app. However, due to the asynchronous change replication process, it will take some time (the replication latency) until this transaction is replicated to the target database.
The bank customer opens his transactions overview page and refreshes the current balance. The mobile banking application queries the target database replica, which may still not have the transfer that was just performed.
Mobile banking customer is confused

What we see here is a typical read-after-write inconsistency issue of environments containing database replicas. In some cases, such inconsistencies might not be a big issue but for a mobile banking application, this is a no-go.

The mobile banking app may introduce additional delays for avoiding this situation. But this logic needs to be implemented inside the application layer upon each target database access — for all applications. Further, how should the application know which delays are required for avoiding the problem? Too small delays increase the likelihood that inconsistent results are retrieved while too large delays negatively impact the application’s performance. Both situations lead to a negative user experience which should be avoided. Last but not least, replication latencies vary over time. There may be phases with heavy changes on the source database where latency increases or peak utilizations of the underlying infrastructure are caused by concurrent applications that, e.g., share the same network connection.

Let us examine in the following, how this problem can be avoided by using IBM Db2 for z/OS Data Gate.

Guaranteeing fully transactionally consistent, current query results as if the query was executed on the so with IBM Db2 for z/OS Data Gate

IBM Db2 for z/OS Data Gate is a service that implements an asynchronous replication pattern between Db2 for z/OS — a database that is running many large-scale banking, insurance, and retail workloads — and Db2 Warehouse in the cloud.

On top of the asynchronous replication, Data Gate also offers a transactionally consistent query execution protocol which guarantees that queries executed on a read-only database replica always deliver the same results as if they were executed on the source database at the same starting time. The protocol ensures that the most recent changes from a source database are already replicated to the read-only database replica before a query is executed. After submitting a query, the protocol automatically delays query execution until it can be guaranteed that the latest data from the source database at query submission time has been replicated to the target database. In most cases, where the data replication process can keep up with the data change rate, the delay will not be noticeable. For peak workloads with higher replication latency, however, the protocol avoids read-after-write inconsistencies with the minimum waiting time.

The following figure illustrates the transactionally consistent query execution protocol:

The bank customer transfers money through the mobile banking app. The transaction execution starts on the source database.
Upon commit, the source database transaction is persisted on the source database and the bank customer receives a transaction confirmation in the mobile banking app. However, due to the asynchronous change replication process, it will take some time (the replication latency) until this transaction is replicated to the target database.
The bank customer opens his transactions overview page and refreshes the current balance. The mobile banking application queries the target database replica, which may still not have the transfer that was just performed.
The extended query execution process delays the execution until the newest transfer is replicated to the target database.
After the latest data changes arrive at the target database, the query execution triggers and retrieves the same result as if it was executed on the source database.
The mobile banking customer retrieves the expected account balance which contains the newest transfer.

Demo

The following animation shows how the transactionally consistent query execution can be used. Data Gate extends the Db2 Warehouse on IBM Cloud Pak for Data target database by an additional statement that allows the user to specify that a query needs to wait for the most recent source data. This is done by the WAITFORDATA special register as follows:

SET CURRENT QUERY WAITFORDATA 10;

With this statement, a user specifies that queries in this transaction may be delayed up to 10 seconds for the most current source data. If the data arrives within the pre-defined time, the query execution will be triggered. Otherwise, it will abort with a timeout error. The following animation compares a normal query execution with the enhanced query execution protocol of Data Gate:

We can see that a usual query execution without waiting for the most recent source data takes only 0.039 seconds. This transaction does not provide any guarantees to operate on the most recent source data.

In the second set of statements, when the WAITFORDATA special register is set, the query execution takes 0.396 seconds instead. The additional query execution time is comprising the collection of the most recent source state and a small delay for waiting until it was replicated to the target database. For more details regarding the special register check the WAITFORDATA documentation.

Deep dive — query execution flow and handling of committed and uncommitted source changes

How does it work? Let us examine the query execution flow in more detail and how committed and uncommitted changes on the source database are considered for the query execution on the replicated target database.

The overall process

The following image illustrates how retrieving transactionally consistent results on a database replica works in conjunction with the asynchronous data replication protocol used by Data Gate (Integrated Synchronization). Data from the source database is continuously replicated to the target database. The extended query processing flow provides fine-grained control to applications that read data from the target database to decide whether and when to wait for the most recently committed source data.

The process starts with the submission of a query to the target database. If the application submitting the query to the target database has not set the CURRENT QUERY WAITFORDATA special register or specified a zero value (which is also the default), the query is executed immediately without any delay. Otherwise, the query is delayed up to the defined number of seconds, until the most recent data has been replicated from the source.
The first step in WAITFORDATA query processing is to receive the most recent committed state from the source database’s transaction log, which is used by Data Gate’s data replication component to retrieve change data. The committed state is either determined for the set of replicated tables that are referenced by the query or for the whole database in case the WAITFORDATA query does not reference any tables at all. Waiting for the most recent state over the whole source database might take a longer time for replication.
After the most recent committed state of the source database has been determined, the data replication component of Data Gate compares it with the current state of the target database by examining the log position of the most recently applied log records.
If the desired target state that has been determined in Step 2 has not been reached, yet, the execution of the query is delayed.
As soon as the desired source database state has been replicated, the query will start executing on the target database. By implementing this protocol, WAITFORDATA queries are guaranteed to always reference the most recent source data.

Handling committed and uncommitted source data

The following example explains how WAITFORDATA queries handle the committed and uncommitted state on the source database. Thread 1 and Thread 2 represent separate applications making changes to table AAA on the source database. Data Gate continuously replicates these changes from the source database to the target database replica by reading them from the transaction recovery log, transferring them to the target database, and applying the change records there. Data Gate also orchestrates the execution of transactionally consistent queries on the target database.

Transactions (1) and (2) which are updating table AAA are already committed on the source database, whereas Transaction (3) is still in progress at the moment when a query (4) is submitted on the target database to read data from a table AAA on the database replica. Transaction (5) is started and committed shortly after the query submission. Transaction (6) starts and commits after the query (4) has been executed.

If the WAITFORDATA special register is specified before the query’s SELECT statement, the query (4) will be executed on the target database only after the newest changes from the committed transactions (1) and (2) are replicated because they were already committed at query start time. Transactions (3), (5), and (6) are still in progress at query start and, thus, the query execution does not need to wait for their commits. However, their changes may be included in the query results. Transaction (5), for example, is started and committed shortly after the query submission to the target database. Similarly to transaction (3), the query will not have to wait for it but the corresponding changes might still be returned in the query results, depending on how fast those changes are replicated to the target database.

In summary, the consistency guarantees of the WAITFORDATA query processing flow is to deliver results on a database state that is not older than the source database state at query submission time.