Strong Read on Master-Slave MySQL Setup — Part 2

An approach to return the latest data despite replication delay

5 min readMay 3, 2022

Simplified diagram on Master-Slave architecture

In this part of the series, we’ll look at how we can solve this issue for range queries, where there is no unique key(UK) in the WHERE clause.

If you haven’t, read Part 1 of the series before continuing to understand the motivation for this approach, the definition of strong read, and the approach for resolving simple queries.

Strong read for range queries

SELECT * FROM TABLE WHERE COL1 = X and COL2 < Y; // where both col1 and col2 are not UKs.

The main idea is to narrow down the query results to a list of UKs so that you can apply the technique mentioned in the previous section to obtain the latest data.

Consider the following (assuming no change to the write strategy):

Initial Data

+-------+---------+------------+
|  UK   |     Age |     Weight |
+-------+---------+------------+
| Aaron |      25 |         70 |
| Lily  |      22 |         65 |
| Adam  |      23 |         60 |
+-------+---------+------------+

There is an update to change Aaron’s WEIGHT from 70 to 60.
Client performs the following query: “SELECT * FROM TABLE WHERE WEIGHT <= 60;” on slave DB.
Assuming there’s a replication delay and the update is not yet propagated to the slave, the query only returns “Adam”. Perform strong read for queries by ID to ensure the data is not stale.

What happened?

Since the previous data of row “Aaron” does not fit in the query pattern, it is not in the list of results. As such, we were not able to utilize the earlier mentioned technique to get the latest data.

How to deal with this?

Modifications to both write and read must be done. The new approach is as follows:

Write

[New] Insert the row’s full data into cache (which we’ll call the query cache). It’s important that we only insert, and not update any previous data for the same row in the cache. We will cover more on this later.
[Unchanged] Invalidate row in data cache (if applicable).
[Unchanged] Commit row.
[Unchanged] Insert row into data cache.

Read

a. Perform range query on slave DB.

b. Concurrently with step (1), get all data from query cache. Split the data into two sets, [A] — Those that fit the WHERE clause, [B] — Those that do not fit the WHERE clause.

c. For each data, either trust it, or read from master according to the following rules:

Rules to decide if slave data is trustable.

Notes:

For row 2 of the rules, the data was to be updated from matching the query to not matching the query. However, since the DB commit might have failed, we have to query master to be sure.

Query Cache

The data in query cache indicate what were the potential changes that were made to a row recently (up to X duration in the past depending on the TTL).

As long as there was one potential update that could have altered the query result, we have to perform a read from master.

Similar to the revision cache, we must insert the data before commiting the DB to ensure strong read, where we are returning the latest data as long as the read query comes after the DB commit, not after the write operation has finished completely.

Pseudocode

With further simplification

query = "SELECT * FROM TABLE WHERE WEIGHT < 60;"
slaveResults = performQueryOnSlave(query)
queryCacheResults =  getAllDataFromQueryCache()for each data in slaveResults:
  if data is not present in queryCacheResults:
    // no recent update to data, can trust slave
    result = append(result, data)
    remove data from slaveResultsfor each data in slaveResults:
  cacheList = list of queryCache results with data's PK
  if latest(cacheList) == data:
    // data in slave is the latest
    result = append(result, data)
  else:
    // two scenarios
    // A: data in query cache fits the query-> Need to get latest data from Master
    // B: data in query cache doesn't fit the query -> but might not be committed to DB
    // Either way, we need to query master.
    result = append(result, getDataFromMasterOrCache(data's UK, query))
  remove cacheList from queryCacheResultsfor each data in queryCacheResults:
  if at least one in list of queryCache results with data's PK fits query:
    // Slave might be lagging.
    // Doesn't matter if it's the latest data in query cache or not (as the later data might not be committed to DB). 
    result = append(result, getDataFromMasterOrCache(data's UK, query))
  remove cacheList from queryCacheResults
  // If nothing in queryCache fits the query and data is not present in slave, then the result wouldn't be changed even if we query master.return resultsfunc getDataFromMasterOrCache(UK, query):
  if UK in dataCache: // see part 1 of the series
    return data from dataCache
  else:
    return queryMaster(query+UK) // Append UK into the WHERE clause. e.g. SELECT * FROM TABLE WHERE WEIGHT < 60 AND UK = "Aaron";

So why must we not overwrite data in query cache?

Let me illustrate what could happen with the following example:

Aaron’s weight is updated from 70 to 60. Query cache data’s weight for Aaron is 60. Master DB’s data is updated to 60.
Aaron’s weight is updated from 60 to 80. Query cache data’s weight for Aaron is 80. However, the DB commit failed and master DB’s data is still at 60.
Due to slave replication delay, Aaron’s weight in the slave DB is still 70.
Client performs the following query: SELECT * FROM TABLE WHERE WEIGHT <= 60.
Both slave and query cache would not return Aaron for the query.
Hence, strong read wasn’t achieved.

Improvement

The number of data to retrieve in step 2 would be ~ total write QPS * TTL. If possible, you can shard your table and have separate keyspace in query cache for each table, therefore reducing the number of data retrieved to total write QPS/number of tables * TTL.
We can batch perform getDataFromMasterOrCache to reduce the total round-trip latency to the DB.
Traffic to master only occurs when there is a recent update to the table, and the data has not yet been inserted into the data cache. As such, the increase in traffic should be manageable for most application use cases.

How does this fit with Strong read for queries by UK?

The approach described here would work for queries by UK as well, by skipping read on slave and setting slaveResults to nil.

In conclusion, while the approaches presented in this series may not be the “ideal” solution to solve the issue of stale data (for example, using Google’s Cloud Spanner eliminates this issue, but it is costly), they are worth considering if using MySQL in a Master-Slave setup is a must, and you’re trying to eliminate/minimize stale data in your queries.

Strong Read on Master-Slave MySQL Setup — Part 2

An approach to return the latest data despite replication delay

Strong read for range queries

Written by Thomas Tham