Currently I am managing two little servers and I need to keep some data in sync. Until now, I was performing this task using the rsync tool and a cron task. This approach is simple and functional, but there is a problem: the sync must be bidirectional if we need to keep two identical replicas and avoid integrity issues. Rsync does not support this.
Introduction to Unison
Unison is a file-sync tool for OSX, Unix, and Windows. It allows two replicas of a collection of files to be stored on different hosts (or different disks on the same host), modified separately, and then brought up to date by propagating the changes in each replica to the other.
This tool works like rsync but it keeps stateful data supporting the bidirectional sync. Also, security is guaranteed because it can work over SSH tunnels.
To install Unison we can download the precompiled binaries from here. In order to use the binary from a terminal, we must place it in a directory included in the system path.
Alternatively, in a debian-like distribution, we can install the tool using apt:
sudo apt-get -y install unison
In OSX, we must use homebrew to perform the installation:
brew install unison
Once installed, we can sync two directories running the following command:
unison -batch dir1 dir2
We must use the
-batch CLI option to launch the tool in batch mode to skip user intervention.
Synching remote hosts through SSH
Once installed we can sync two servers connected over the internet. To keep things safe we will use SSH tunnels to transfer the data.
In this scenario, one of the servers acts as the master and the other as the slave. Unison must be installed on both master and slave. Configuration must be set in the master, which is responsible for launching the sync process.
Unison provides an extensive set of CLI options that we can use to establish the sync configuration. Also, we can set those options using a configuration file.
The configuration file must be named
.unison/default.prf and placed under the user’s home directory. The following gist shows the configuration of my master server:
The most important option to setup is the
root of the paths to sync. There must be exactly two: the local server path and the remote server path. Because we are using the SSH protocol in the remote root, we need to exchange ssh keys between slave and master.
Other relevant options are:
pathoption designate the paths inside the root that must be synched.
- With the
ignoreoptions we can define path exclusions using globs or exact matches.
batchoptions must be used to avoid the user intervention when the sync process is performed.
- The purpose of the
fastcheckoption is to define which method must be used to check if a file should be updated. When set to
true, the size and the last modification time of the file are used to check if the file has changed. When set to
falsea full-content comparison is used.
The sample file is self-descriptive but we can query the full documentation running
Scheduling the sync
Once configured, we can run the
unison command in the master server to perform the sync.
A simple approach to avoid manual synching is to use a cron task to schedule it every minute. Since we want to avoid multiple unison processes running at the same time, we must create a simple script to launch unison only if it is not currently running:
The last step is adding our script to the
*/1 * * * * /usr/local/bin/sync.sh > /dev/null
Unison and cron simplify the process of periodically synching a pair of local or remote folders. Unison is similar to rsync but adds support for bidirectional synching.