Use database replication to subset data to remote databases

Josh Hicks
Data Weekly by Jumpmind
3 min readOct 4, 2019

Traditional data storage involved a single or centralized database that was available to all systems and users. However with the popularity in cloud computing and mobile technologies there is an increased demand to have smaller databases available locally. This improves performance as well as redundancy during networking outages or maintenance of the central system.

There are some common pitfalls though in trying to replicate a central or primary database to all remote instances. The simple solution is to mirror or replicate the entire primary database to each location. Although this might sound like the simple and straight forward solution, it comes with some overhead. This can increase traffic on an already restricted network or might even be too large to store locally. Ultimately there may just be a lot of unnecessary data to be moved to each location which would never be utilized.

Subsetting the data from the primary database to the remote databases can be used to alleviate sizing, networking, and unused data issues. However the process of subsetting the data requires a proper understanding of the data and the tables and relationships they are stored in. Some tables might still be required fully at each location. While other tables might contain the subsetting criteria directly within the table (possibly a location_id column for example). Finally there might be more complex queries required to determine how to subset a given table based on other tables.

The next step is to find a replication engine that will support all your subsetting use cases. Basic replication solutions might only handle sending all data everywhere or allow some basic subsetting. It comes down to how many use cases are required for your master data set.

To review if your system is eligible for subsetting consider the following steps.

  1. Determine the base or lookup tables that would be required at all locations. These might be parent tables in foreign key relationships or types of master data that is common across your domain.
  2. Determine tables that could be subset but also contain the data to subset directly within the table. A common example of this is a column or set of columns that identify this data for each location. Usually this would be a column like a location_id for instance. In this case the data is self contained with the subsetting rule in each row. This type of subsetting is usually limited to one location per row.
  3. Determine tables that could be subset but require a more complex approach to determine the subset location of the data. These data sets might require a single additional reference table or multiple tables to determine how the data should be subset. These data sets could then be sent to a single location or multiple locations.
  4. Find a replication tool that will support each of the three use cases you need above.

Finally if you are looking for a tool or would like to read more about how each of these use cases are supported through SymmetricDS you can read the following blog for a deeper analysis.

--

--