Automated Syncing of Elasticsearch Clusters

Here at Yodle, we use Elasticsearch to power many different features in our products. Recently, we’ve been working on a new version of Centermark, a BI product for franchise businesses. We serve up lightning-fast visualizations of data built on top of Elasticsearch. As we’ve progressed more with this project, we’ve realized the need for developers to have a cluster against which automated tests can be run before they introduce code into the production environment. Our use case is pretty simple: we want to run two separate clusters, one that serves the visualizations that our customers use every day to empower their decision making, and one developers can use to ensure their code isn’t going to cause an outage or unnecessary slowness.

Luckily, Elasticsearch has a built-in snapshot and restore process, which makes this process much smoother. First, on the production cluster, we set up a “snapshot” job with cron which backs up that cluster’s data to S3. Then, on the test cluster, we set up a “restore” job which clones the data from S3 into the test cluster. There are still a few issues that we have to deal with using this approach. Due to the cluster setup, we don’t want the same aliases or index names from the production cluster to get onto the test cluster. We also want to specify the number of replicas used in the testing environment to avoid wasting space on the test cluster. This blog post will demonstrate how we solved these problems, but our approach can be generalized to your use case. Let’s get into setting up the clusters by using Elasticsearch’s snapshot and restore API!

The Snapshot Process

The official AWS Cloud Plugin for Elasticsearch supports using S3 for storing snapshots. We followed the documentation to setup a repository in the prod cluster. Creating a daily snapshot is as easy as calling curator via cron. We put a curator command similar to this into cron:

We also have a process in place to delete snapshots older than 30 days:

The Restore Process

We then created another repository in the test cluster, pointing to the same S3 bucket, but we used an Amazon IAM account that only has read access to that bucket. We also used the “readonly: true” flag when creating the repository with the Elasticsearch Cloud AWS plugin. After setting up the snapshot process and repositories, the restore process is straightforward. The four basic steps are:

  1. Retrieve the name of the snapshot to use for the restore process
  2. Close indices to be restored (if they exist)
  3. Run restore process
  4. Clean up old indices

If you just want the source code for restoring you can click here and be on your way.

We wrote a python script using the Curator library, a wrapper over, to complete the restore process. Both of these libraries are available through pip. Curator requires a connection to your Elasticsearch cluster, which is called a client. Using the client, curator can call all of Elasticsearch’s useful APIs. A client setup looks something like this:

Retrieving Snapshot Names

Curator provides some nice functionality to retrieve snapshot names. Since the two clusters share a snapshot repository, a single request on the test cluster allows us to grab all of the snapshots:

Our snapshots are named using the date on which they were created (this is also managed by curator!) so we can guarantee that we will get the most recent snapshot by sorting lexicographically.

Closing Appropriate Indices

Elasticsearch won’t restore on top of an open index, so after getting the snapshot we close all of the indices that are about to be restored. We prefix all of our indices with “prod_” when they’re copied over, so any index in the test cluster which starts with “prod_” needs to be closed before running the restore command. Again, curator provides our solution:

Curator gets all of the indices in our test cluster with the get_indices method, and then applies the prefix filter to limit the results to those indices which start with “prod_”. Now that we have all “prod_” indices, we can call another curator function with those indices to close them.

Running The Restore

Finally, assuming everything went well, we can run the restore process! Curator has a built in restore method, so all you have to do is figure out which options are most important for your use case. Here’s how we call the restore process:

This setup will allow us to rename all of the indices in the snapshot to start with “prod_”, drop their aliases, and give them only one replica.

Cleaning Up

We may remove an old index in production, thus removing it from the snapshot, but it will always exist in the test cluster. This is because the snapshot and restore process we’ve explained so far doesn’t delete anything; the scope of the process is confined to the indices in the snapshot. Therefore, we may end up having a bunch of leftover indices unless we clean up!

Our general strategy for clean up is to delete all indices which start with “prod_”, but are not in the prod snapshot. You can get all the indices for a given snapshot using curator like so:

Given these indices, we can filter the indices in the test cluster using a set difference. The first set contains all the indices in the test cluster which start with “prod_”. The second set contains all the indices in the given snapshot, acquired from the above call to get_snapshot_indices. Here’s our code for taking the difference between these two sets:

In the above snippet, we add a “prod_” prefix to the indices in the snapshot, so we’re comparing prod_apples to prod_apples. Curator has a built in delete function, so we can call that on the indices, and everything gets wrapped up nicely. You don’t need to use curator to do any of this, but it provides an excellent wrapper around the required requests, so it’s a pretty good deal. If you have any questions about the snapshot and restore process, why don’t you leave a comment below?

Does work like this sound interesting? We do stuff like this at Yodle every day, and we’re also hiring! How convenient! Check out our careers page at for more details.

Like what you read? Give Tech Blog a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.