Channel-ing Your SymmetricDS Configuration

Klementina Chirico
Data Weekly by Jumpmind

--

SymmetricDS is an open source data replication tool that uses a change data capture process to replicate data between databases. One of the key components of data synchronization in SymmetricDS is configuration. This involves identifying and specifying your data sources (databases) as nodes, the direction of data flow, the logical grouping of nodes, the method of movement between nodes, and more. This article will be focused on channels in SymmetricDS and things to note when configuring them.

In general, SymmetricDS replicates data in the order in which database changes are captured. However, there are some scenarios in which you may need to prioritize certain information or allow different data to move at the same time. For this purpose, channels are used to define the movement of data as well as logically group data to provide structure in its movement. Channels can also provide you control over your synchronization and the frequency in which data moves, making it a powerful tool in your config. Here are some things to keep in mind:

Grouping related data into channels allows for more efficient synchronization. For example, in a retail scenario, you may have the tables ‘item’ and ‘item_selling_price’. It makes sense for these tables to be configured on the same channel because they have related data. Additionally, if these related tables have foreign key dependencies to each other, then it is especially important that they are moving through the same pathway. This is because in a situation where replication stops on a channel for any reason, say a network timeout, related tables with triggers on that channel will also stop replicating. This ensures that data between dependent tables are synced in the correct order and therefore prevents foreign key issues such as child rows getting synced before parent rows. Another benefit of grouping related data into channels is that when one channel stops, other channels keep data flowing for tables unrelated to the failed channel. Therefore, one stopped channel will not halt all data synchronization.

The other thing to keep in mind is the effect of channels in regards to performance. Too many channels can hurt your performance and routing. In SymmetricDS, data is routed channel by channel and during this process there are queries to the database. So, a large number of channels will result in a large number of added queries to the database which in turn slows the performance of the routing job. Not enough channels, however, can also hurt your performance due to the fact that a stopped channel will also stop replication for the other tables on that channel as mentioned previously. An improvement to performance can be achieved through channels with different queues. A queue is just a name that can be specified to a channel and any two or more channels that have the same queue value are processed synchronously (one at a time). By default, all channels have a ‘default’ queue value so they are all processed synchronously. However, if you specify a channel with a different queue, say ‘item_queue’ using the retail example from earlier, then the channels under the ‘item_queue’ queue and ‘default’ queue will be processed asynchronously (in parallel). This asynchronous loading of data allows for more than one batch to be processed at a time, speeding up replication.

Channels allow you to group, prioritize, and control your data flow. Knowing how your tables interact is an important aspect to fully utilizing the power of channels in your SymmetricDS configuration. Therefore, when configuring your channels, don’t forget to keep in mind that dependent tables should go on the same channels in order to avoid foreign dependency issues and that queues can be utilized to enable multiple channels to sync in parallel and improve performance.

--

--