How to Simply Snapshot and Restore Elasticsearch Cluster
Elasticsearch snapshot and restore API’s allows to create snapshots of individual indices or an entire cluster into a remote repository. The API’s allows to take a snapshot and save it to many repository types like file system, shared UNC paths, Amazon S3 (and other cloud providers), HDFS and etc.
In this post I will briefly explain how to take a cluster snapshot running on one machine and restore it on another. I will focus on how to take a snapshot specifically on Amazon EC2 instance using Amazon S3 as a repository and restore it on another Amazon EC2 instance.
I deliberately keep it as simple as possible and if you wish to have more advanced options, you can always refer to Elasticsearch documentation here and here
Prerequisites
- In order to continue I assume that you have at least two running Amazon EC2 instances
- On each instance you already have Elasticsearch installed
- On each instance you need to install cloud-aws plugin. Here is how to install it:
#Under your Elasticsearch installation usually under /usr/share/elasticsearch run:sudo bin/plugin install cloud-aws#You'll need to restart elasticsearch (make sure you know how to do it without harming your cluster):service elasticsearch restart
- For elaborated information on how to have the prerequisites done, you may refer to the following article
REST APIs for cluster snapshot to S3 and restore
Define a the snapshot configuration in Elasticsearch
#Set snapshot definitions. Refer to Elasticsearch documentation for advanced options:PUT _snapshot/my_snapshot
{
"type": "s3",
"settings": {
"bucket": "your_predifined_s3_bucket",
"region": "us-west-1",
"base_path": "path_under_s3_bucket",
"access_key": "your_amazon_s3_accesskey",
"secret_key": "your_amazon_s3_secretkey"
}
}
Take a snapshot and give it a name (ex. snapshot_1)
#Run the snapshot process:
PUT /_snapshot/my_snapshot/snapshot_1?wait_for_completion=true
Get snapshot process status (it may take time to complete the operation)
#Get snapshot status:
GET /_snapshot/my_snapshot/_status
Validate the snapshot
#Validate snapshot:
POST /_snapshot/my_snapshot/_verify
Reload the snapshot on another machine
Define the same snapshot configuration as above:
#In order to reload cluster at another machine:
PUT _snapshot/my_snapshot
{
"type": "s3",
"settings": {
"bucket": "your_predifined_s3_bucket",
"region": "us-west-1",
"base_path": "path_under_s3_bucket",
"access_key": "your_amazon_s3_accesskey",
"secret_key": "your_amazon_s3_secretkey"
}
}
Run the restore process on the second cluster
#Run restore:
POST /_snapshot/my_snapshot/snapshot_1/_restore
Run validations as above on the second cluster
Please note that you can have control on many parameters like the indexes to be snapshot/restored, index metadata, the snapshot rate, bulk sizes and many more parameters. All are explained well in the following Elasticsearch documentation
Hope that this post helps :)
Follow me on:
Medium | Twitter | Linkedin | Stackoverflow | GitHub
Originally published at github.com.