ElasticSearch — Backup and Restore indices from S3
We use a combination of Redis + ElasticSearch + Logstash to index/search all application service logs in one place. Multiple heterogeneous services have loggers that push to Redis and Logstash processes them up after data massaging and stores them in ElasticSearch. The indices will be rolled over on a daily basis and each index will be of the form logstash-YYYY.MM.DD. It warrants another blog post to talk about the stack in detail.
Since each day’s index can occupy considerable disk space based on volume of logs flowing in, we maintain indices for the past 90 days on disk and archive the rest to Amazon S3 for later access. It takes no time to actually restore an old index and make it work with ElasticSearch. The steps to setup the required configuration and backup scrips are listed in this post. I will be using Ubuntu as the target OS for illustrative purposes, please use equivalent commands as applicable in your case. The entire procedure should be within a node running ElasticSearch.
The best way to manage elasticsearch indices is to use Curator. It’s a handy set of python scripts to list, delete, restore indices based on various search criteria. Use the following command (for ubuntu) or equivalent to install Curator first.
$ sudo pip install elasticsearch-curator
2. S3 as snapshot repository
ElasticSearch has a concept called Snapshot where you can copy individual indices or complete clusters to a remote repository. The repository could be a shared filesystem, HDFS, S3, etc. For our discussion, we will illustrate using Amazon S3 storage. We will first install the AWS plugins for ElasticSearch.
$ cd /usr/share/elasticsearch
$ sudo bin/plugin install elasticsearch/elasticsearch-cloud-aws/2.6.1
$ sudo service elasticsearch restart
It’s important that you use the right plugin version for the version of ElasticSearch installed. Refer this table for compatibility.
Once, the plugin is installed, create a bucket in S3 that will be used to offload the indices. Keeping the S3 credentials handy, use the following command to configure S3 as a remote repository for ElasticSearch:
Using the above command, we have created a repository called S3-backup and we have specified the bucket name that we created earlier: logstash-backup. Also mention the correct region and credentials as applicable.
3. Backup commands
You can come up with a strategy to backup indices. For this example, let’s retain indices that were created in the past 90 days and move the rest to S3. To do that, use the following commands
We have a cron job that runs the above script daily to frequently prune our indices. Since we run this script daily, each snapshot will backup and remove exactly one index and this one-to-one mapping between a snapshot and index will help when we would like to restore a specific index at a later point in time. Please note that there is no need to restart ElasticSearch as it is aware of the lost indices realtime.
4. Restore commands
We can restore a index directly by calling ElasticSearch with relevant params. But to do that, you need to know the snapshot name that was used to backup that index. There are two ways to know the snapshot name. It is printed at the time of creating the snapshot like this…
The second way to know the snapshot name is to refer to the index folder inside of the S3 bucket
Let’s run the command to restore the above snapshot from S3 to ElasticSearch
Please read through the Curator docs to know more about other patterns to filter indices that you need