Closing the (Data) Gap

Mark Michalek
Data Weekly by Jumpmind
5 min readMay 6, 2019

An advanced SymmetricDS concept that is beneficial to understand is “data gaps.” In this article, I want to explain the concept of the “data gap” so you have a firm understanding of the concept if you have to deal with it.

What are data gaps?

In order to understand data gaps, you need to understand a little about how SymmetricDS captures change data in order to replicate that data.

Say, for example, I have 3 tables called item, order, and line. During the course of using my application, I end up inserting an item and then updating that item. A little later I create an order which has 2 line items.

SymmetricDS will capture one row into sym_data for each DML Statement (i.e. INSERT, UPDATE, or DELETE) that I executed against the database through my application. So the example above would create sym_data rows such as the following:

Note the data_id column in sym_data. This is an identity column or a sequence depending on the database platform. The data_id has to be ascending, and this is what SymmetricDS uses to keep the data in strict playback order.

Once the data is captured to sym_data, SymmetricDS will “Route” the data. Routing in SymmetricDS terminology is where the captured data is put into “Batches” to be delivered to remote nodes. Technically, when a sym_data row has a matching row in sym_data_event, it means it has been routed.

Now, on a large and busy system, it is not usually practical to join sym_data and sym_data_event all the time to determine what data needs to be routed next.

And that is where data gaps come in.

Data gaps are ranges of data_id’s where data is expected to show up and would need to get routed. Here are the data gaps in the example above:

Note, there are two gaps in this example. There is a gap of data id’s between 5002–6433 and then a “last” gap from 6437 and beyond. There will be a row in sym_data_gap for each gap in the data id’s that the router will search for data in.

In this example, the “last” gap (starting at 6437) is the natural place for the router to look for new data to replicate. But why is there that other gap, starting at 5002?

Why do gaps form?

Gaps such as the first one in the above example form for 2 basic reasons:

1) A database transaction was executed and then rolled back.

In this case, we could imagine a user issuing a price update that affected 1432 rows which accounts for data id’s 5002 through 6433. Then the user rolls back the transaction. At this point, the data rows will not be committed to sym_data, but those data_id’s are burned and effectively lost forever.

2) A database transaction was executed but not committed yet (and there are other users committing data in the meantime — concurrency.)

Imagine a similar scenario as #1: the user issues a price update on 1432 rows but hasn’t committed the work yet. Then, another user concurrently commits the order which accounts for data id’s 6434 through 6436.

In this case, the order rows are committed and available to get routed. But SymmetricDS will recognize the “hole” in the data_id sequences and create a sym_data_gap row to continue to watch for those data rows to get committed and be available for replication.

In a perfect world, there would only be one data gap, and that would be the last data gap which just basically keeps track of the next data_id we expect to see show up. But in some systems, if there is high concurrency or high rollback activity, the number of rows in sym_data can grow quickly, well up into the 1000’s or tens of 1000’s. There’s nothing inherently wrong with lots of sym_data_gap rows; that is normal for a busy system. At the same time, lots of data gaps, especially from rollbacks, can start to degrade the performance of the routing process. This is because even as the data_id’s march forward, the router has to go back and keep looking in these older data_gaps until they expire. That leads us to the first problem with data_gaps.

Problems with Data Gaps

Gap Expiration

Say in our example, the user did roll back his or her 1432 price rows. We end up with a data gap that will never get filled in.

Now the question is: how long do we keep looking for data in the range 5002–6433? If we knew for sure the transaction had been rolled back, we could just expire the data then and move in.

So the answer to the question “how long do we keep checking this gap?” is — a certain amount of configurable time. The routing.stale.gap.busy.expire.time.ms parameter controls just how long the system will wait for a gap to expire and defaults to 2 hours. Reducing that time is generally not recommended because once a gap is expired, SymmetricDS will not go back and look for that data anymore and that data will not be replicated.

Gap expiration is normal, and is harmless when the gap represents rollback transactions or a gap that would otherwise never be populated with data. Gap expiration's look like this in the log:

2019–04–12 14:12:55,442 INFO [corp-000] [DataGapFastDetector] [corp-000-job-9] Expired 1 data gap(s) between data_id 99 and 128 and between create_time 2019–04–11 14:11:55.425 and 2019–04–11 14:11:55.42

Data_id out of range

Sometimes the next data_id jumps ahead and jumps out of range of the last gap. Here is the idea:

A dramatic jump in the sequence like this could be caused by a database crash, for example. Under the right circumstances, this could cause the Symmetric routing process to appear to stall as it searches for data in the 6437–50006437 range that was skipped by the database’s sequence.

Hopefully, this article helped you better understand the purpose and some of the challenges with the data gap system. Let us know what questions you have about this concept.

--

--

Mark Michalek
Data Weekly by Jumpmind

I started working on ‘big’ data when I used my first computer to catalog my hockey card collection.