Contrasting SymmetricDS Filter Patterns

Austin Brougher
Data Weekly by Jumpmind
3 min readJan 31, 2018

Users often incorporate some type of filtering in their SymmetricDS configuration. Knowing when and where to setup filters can save time and resources. The following matrix compares the four most common patterns to filter data. Each filter’s strength is highlighted in green.

When filtering individual columns, use Transforms. When filtering rows or records, we should default to using Routers. Routers are a good first choice but they have their own deficiencies.

  1. If the filter logic depends on data at the target node. Router logic runs on the originating node only. If the filter logic requires access to data on the target node, a Load Filter is the better choice. Load Filters run at load time on the target node. The downside of using a Load Filter is network overhead. Since the filtering is done at the target, all of the data must first be sent to the target before being evaluated. Depending on the quantity of data and the percent of records being filtered out, this may not be an issue.
  2. If the filter logic requires real time access to the database. Routing logic runs on a periodic interval. Sometimes we need to filter based on other data in the originating database. If that ‘other data’ is modified or deleted before the router logic has a chance to run, the filter will not work correctly. In this case, we should use trigger sync conditions. Trigger sync conditions are run as part of the trigger text and run at the time of the CRUD operation. This guarantees access to the DB source data at the time of the operation. This pattern has a few drawbacks and it should be avoided if possible. 1) Since the filter text is run as part of the the CRUD operation, the filter query must be efficient or risk a performance hit. 2) Trigger Sync Conditions only filter when triggers are fired, they will not filter data on initial load. The initial_load_select on the trigger_router table will also need to be updated to match the trigger conditions.

Examples:

  1. Only sync table records when the column ‘STATUS’ is set to ‘READY TO SEND’.
    Since we are filtering on the record, we should use a Router. Specifically, we should use a column match router.
  2. Filter out the password column in a given table.
    Since we are filtering at the column level we should default to using a Transform. A load filter would also work but would require scripting and would not be filtered out until after the password was sent to the target node.
  3. Only sync records until the target table has 100 records.
    We are syncing records, so we first consider using a router. The router will not work in this situation because routing takes place on the source node. The count only exists on the target node. We need to use a Load Filter. The follow BSH script can be used to limit the target table to 100 records.

4. You may have a global value in the database indicating when data should be captured. It is important to check the global value at the time the trigger runs, since routing time could take place minutes later and the global value may have changed. In this case its important to use Trigger Sync Conditions. When using Trigger Sync Conditions, we need to remember, initial loading data does not use the ‘sync on condition’ text. If you want this same logic to apply during an initial load, you will need to include this same logic on the initial_load_select column in the trigger_router table.

With four different filter patterns, you should have the tools you need to satisfy the trickiest filtering requirements. Let us know in the comments if you have any questions about what pattern to choose in your situation.

--

--