How to Simply Snapshot and Restore Elasticsearch Cluster

2 min readJul 11, 2016

Elasticsearch snapshot and restore API’s allows to create snapshots of individual indices or an entire cluster into a remote repository. The API’s allows to take a snapshot and save it to many repository types like file system, shared UNC paths, Amazon S3 (and other cloud providers), HDFS and etc.

In this post I will briefly explain how to take a cluster snapshot running on one machine and restore it on another. I will focus on how to take a snapshot specifically on Amazon EC2 instance using Amazon S3 as a repository and restore it on another Amazon EC2 instance.

I deliberately keep it as simple as possible and if you wish to have more advanced options, you can always refer to Elasticsearch documentation here and here

Prerequisites

In order to continue I assume that you have at least two running Amazon EC2 instances
On each instance you already have Elasticsearch installed
On each instance you need to install cloud-aws plugin. Here is how to install it:

#Under your Elasticsearch installation usually under /usr/share/elasticsearch run:sudo bin/plugin install cloud-aws#You'll need to restart elasticsearch (make sure you know how to do it without harming your cluster):service elasticsearch restart

For elaborated information on how to have the prerequisites done, you may refer to the following article

REST APIs for cluster snapshot to S3 and restore

Define a the snapshot configuration in Elasticsearch

#Set snapshot definitions. Refer to Elasticsearch documentation for advanced options:PUT _snapshot/my_snapshot 
{ 
  "type": "s3", 
  "settings": { 
    "bucket": "your_predifined_s3_bucket", 
    "region": "us-west-1", 
    "base_path": "path_under_s3_bucket", 
    "access_key": "your_amazon_s3_accesskey", 
    "secret_key": "your_amazon_s3_secretkey" 
  }
}

Take a snapshot and give it a name (ex. snapshot_1)

#Run the snapshot process:
PUT /_snapshot/my_snapshot/snapshot_1?wait_for_completion=true

Get snapshot process status (it may take time to complete the operation)

#Get snapshot status:
GET /_snapshot/my_snapshot/_status

Validate the snapshot

#Validate snapshot:
POST /_snapshot/my_snapshot/_verify

Reload the snapshot on another machine

Define the same snapshot configuration as above:

#In order to reload cluster at another machine: 
PUT _snapshot/my_snapshot 
{ 
  "type": "s3", 
  "settings": { 
    "bucket": "your_predifined_s3_bucket", 
    "region": "us-west-1", 
    "base_path": "path_under_s3_bucket", 
    "access_key": "your_amazon_s3_accesskey", 
    "secret_key": "your_amazon_s3_secretkey" 
  }
}

Run the restore process on the second cluster

#Run restore:
POST /_snapshot/my_snapshot/snapshot_1/_restore

Run validations as above on the second cluster

Please note that you can have control on many parameters like the indexes to be snapshot/restored, index metadata, the snapshot rate, bulk sizes and many more parameters. All are explained well in the following Elasticsearch documentation

Hope that this post helps :)

Follow me on:

Medium | Twitter | Linkedin | Stackoverflow | GitHub

Originally published at github.com.

How to Simply Snapshot and Restore Elasticsearch Cluster

Prerequisites

REST APIs for cluster snapshot to S3 and restore

Reload the snapshot on another machine

Written by Eyal Dahari