Database Synchronization with SymmetricDS: Performance Tips

Published in

Data Weekly by Jumpmind

5 min readFeb 24, 2018

SymmetricDS is a cross-database replication application that relies on database triggers and data capture tables. Like any database centric application, it is constrained by the capabilities of the particular databases that are being replicated, both from a pure horsepower and available database features perspective.

The following article will talk about tips for maximizing the performance of SymmetricDS.

Proximity

First and foremost, the SymmetricDS application instance needs to be in close proximity to the database itself. SymmetricDS is a Java application that uses a JDBC driver to communicate with the database to extract, route and load data. JDBC drivers can be chatty and are designed for the client application to be on the same network as as the database.

One of SymmetricDS’s specialties is to sync across wide area networks. In order to do so, the application should be installed near each database in the synchronization scenario and then the SymmetricDS instances communicate with each other over a network efficient HTTP/S protocol.

Horsepower

The biggest limiting factor for the performance of data synchronization is the database itself. SymmetricDS installs database triggers that capture data changes in a data capture table.

The more powerful the database, the better the database triggers will perform
The more powerful the database, the easier it is to query bigger database tables
The more powerful the database, the less contention on database tables which will allow query to perform better

Databases are constrained by CPU, memory and disk access. In order to optimize synchronization performance, the first place to start is making sure you have allocated enough horsepower to your database.

Maintenance

The main data capture table is sym_data. Database triggers capture changes and insert them in the sym_data table. These changes are then queried to be grouped into batches in the sym_data_event and sym_outgoing_batch tables. Batches will be queried again during extraction. If these critical database tables are not performing then SymmetricDS won’t perform.

Periodic database table maintenance should be performed on the four major runtime tables:

sym_data
sym_data_event
sym_outgoing_batch
sym_incoming_batch

Table maintenance will be different from database vendor to database vendor.

SymmetricDS has a purge routine that is scheduled to run at midnight by default. The act of purging the runtime tables can introduce contention on the tables, so that process should be scheduled at a time of lower database activity or it should be run often enough that it completes quickly.

The purge.retention.minutes parameter can be used to control how long successfully synchronized data is retained after purge runs. The default is to keep one day worth of data.

Sometimes when a lot of data has changed or the database is bogged down database queries against these tables can take longer to complete. The query timeout can be increased by setting db.sql.query.timeout.seconds in your properties file.

Another way to reduce contention on sym_data is to partition it. The best column to use for partitioning is channel_id as routing queries use channel_id.

Some databases (Sql Server) escalate locks from row to page to table level locks. This is bad for tables that have triggers on them that insert into sym_data. You can reduce deadlocks by disabling lock escalation on key tables.

Load Performance

SymmetricDS loads replicated data into target databases using prepared sql statements. If an insert, update or delete happens on the source system, then an insert, update or delete happens on the target system.

Like any prepared statement its performance is susceptible to the performance of the indexes on the primary key. If there are no primary keys on the target system, then updates or deletes can be slow. Sometimes database collation can affect performance as well (search sendStringParametersAsUnicode for Sql Server and Sybase).

For an initial load of data, be sure to use the initial load feature of SymmetricDS. You would not want to load data into your source system to let it sync down naturally. It is much more efficient to kick off an initial load that extracts data from the source tables to send to the client.

SymmetricDS also has bulk loaders. You can tie a bulk loader to a channel to increase the performance of initial loads. Because initial loads are done (by default) on the reload channel you can set the reload channel’s data_loader_type to ‘bulk’.

LOB data

Another common bottleneck is the synchronization of LOB data. By default LOBs are captured in the SymmetricDS data capture table. This can lead to a very big sym_data table. Many users turn off the capturing of lobs by setting use_stream_lobs to true in the trigger configuration.

It is also a good idea to segregate tables with LOB data to their own channel and set the queue setting so that LOB data is sent asynchronously. That way, LOB table changes won’t hold up other data synchronization.

Some databases have limitations that a SymmetricDS user should be aware of when it comes to capturing, extracting and updating lob data.

Be sure to read the database notes section in the User Guide.

Additional Tips & Tricks

Hopefully this article has illuminated key areas of the system to watch to improve your overall database synchronization throughput and performance.

Here are a few other tips and tricks.

SymmetricDS configuration can greatly affect performance as well. Be certain to thoroughly test for performance. Be smart in assigning channels. All related tables should be on the same channel. Channels can be sent asynchronously by assigning a queue in the channel configuration.
If a large load of data needs to occur at the source database consider bypassing the triggers and sending reload events instead. It is much more efficient to execute sql against the source database to extract the changed data after it has been loaded rather than let SymmetricDS triggers capture millions of rows in sym_data. This technique will help the data load process at the source complete much faster as well.
Use the statistics captured in the sym_outgoing_batch and sym_incoming_batch tables to help you pinpoint bottlenecks. There are statistics for how long it took to route, extract, transform, send and load a batch of data.

Happy syncing!