AWS Elasticsearch Manual Snapshot and Restore on AWS S3

Arun Gowda
MediBuddy Product and Technology Blog
5 min readJan 29, 2019

You can use the Elasticsearch API actions in Amazon Elasticsearch Service to take manual snapshots of your domain. You can easily back up your entire domain this way. You can also take snapshot and restore a single index, or multiple indexes. This blog post walks you through backing up and restoring a single index by using an Amazon S3 bucket.

Prerequisites

  1. ElasticSearch Domain
  2. Amazon S3 Bucket
  3. AWS Identity and Access Management (IAM)

Amazon S3 bucket

Stores manual snapshots for your Amazon ES domain. Make a note of the bucket’s name. You need it in two places:

  • Resource statement of the IAM policy that is attached to your IAM role
  • Python client that is used to register a snapshot repository

Create a bucket in Amazon S3.

my bucket name : elasticsearch-backup-indices

Once the bucket is created get the bucket arn

arn:aws:s3:::elasticsearch-backup-indices

AWS Identity and Access Management (IAM)

Delegates permissions to Amazon Elasticsearch Service. The rest of this chapter refers to this role as TheSnapshotRole.

The trust relationship for the role must specify Amazon Elasticsearch Service in thePrincipal statement.

To create this IAM policy, open the IAM console, switch to the Policies tab, and choose Create Policy. Select Create Your Own Policy, and give your policy a name.Ex : elasticsearchbackup-policy

  • Attach the following permissions, Make sure to change the bucket ARNs
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"iam:PassRole",
"s3:ListBucket"
],
"Resource": [
"arn:aws:iam::YOUR-ACCOUNT-ID:role/es-s3-backup",
"arn:aws:s3:::elasticsearch-backup-indices"
]
},
{
"Sid": "VisualEditor1",
"Effect": "Allow",
"Action": [
"s3:PutObject",
"s3:GetObject",
"s3:DeleteObject"
],
"Resource": "arn:aws:s3:::elasticsearch-backup-indices/*"
},
{
"Sid": "VisualEditor2",
"Effect": "Allow",
"Action": "es:ESHttpPost",
"Resource": "arn:aws:es:region:YOUR-ACCOUNT-ID:domain/YOU-ELASTIC-SEARCH-DOMAIN-NAME"
}
]
}

This policy document grants list, get, put, and delete object permissions to whomever assumes the role to which it is attached. When you’re done, choose Create Policy.

  • Attach the following trust relationship to the role
{
"Version": "2012-10-17",
"Statement": [{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "es.amazonaws.com"
},
"Action": "sts:AssumeRole"
}]
}

Registering a Manual Snapshot Repository

You must register a snapshot repository with Amazon Elasticsearch Service before you can take manual index snapshots. This one-time operation requires that you sign your AWS request with credentials that are allowed to access

You use Elasticsearch’s _snapshot API action to register a repository with Amazon ES.

# Install some prerequisites packages
yum -y install python-pip
pip install requests-aws4auth
NOTE: Change the host, region and the ROLE ARN in the below code to suit your environment.# Create the pythong file to register the repo
cat >/tmp/register-repo.py <<"EOF"
import boto3
import requests
from requests_aws4auth import AWS4Auth

host = 'https://search-mb-production-app-us-west-2.es.amazonaws.com/'
region = 'us-west-2' # For example, us-west-1
service = 'es'
credentials = boto3.Session().get_credentials()
awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)

# Register repository
path = '_snapshot/my-snapshot-repo' # the Elasticsearch API endpoint
url = host + path

payload = {
"type": "s3",
"settings": {
"bucket": "elasticsearch-backup-indices",
"region": "us-west-2",
"role_arn": "arn:aws:iam::YOUR-ACCOUNT-ID:role/es-s3-backup"
}
}

headers = {"Content-Type": "application/json"}

r = requests.put(url, auth=awsauth, json=payload, headers=headers)

print(r.status_code)
print(r.text)
EOF
# Execute the file to register repo
chmod 700 /tmp/register-repo.py
python /tmp/register-repo.py

Output should be ,

200
{"acknowledged":true}

Taking Manual Snapshots

Once you have your repository set up, you use the _snapshot API action to create a snapshot of your index. By default, Amazon ES snapshots the entire cluster’s indexes

You specify two pieces of information when you create a snapshot:

  • Name of your snapshot repository — Ex: my-snapshot-repo
  • Name for the snapshot — Ex: 2019–01–19
# curl -XPUT 'elasticsearch-domain-endpoint/_snapshot/repository/snapshot-name'

curl -XPUT 'https://search-mb-production-app-us-west-2.es.amazonaws.com/_snapshot/my-snapshot-repo/2019-01-19'

You should get message after the above command

{“accepted”:true}

Note: Snapshots are not instantaneous; they take some time to complete.

By default, Amazon ES snapshots the entire cluster’s indexes. By snapshotting a single index, you give yourself flexibility in where and when you restore the data. You can alter the default behavior with the indices setting.

curl -XPUT https://search-mb-production-app.us-west-2.es.amazonaws.com/_snapshot/my-snapshot-repo/2019-01–19
{euiv
“indices”: “alb-accesslog-2018–08–28”,
“ignore_unavailable”: true,
“include_global_state”: false
}

You can specify one or more indexes for backup in this snapshot. You can choose to ignore errors if any of the indexes is not available by setting ignore_unavailable to false. You also might want to store the cluster's global state. However, for index-level backups, doing this usually won't make sense. Set include_global_state to false.

Use the following command to verify the state of snapshots of your domain:

curl -XGET ‘https://search-mb-production-app.us-west-2.es.amazonaws.com/_snapshot/my-snapshot-repo/_all?pretty'

Restoring Snapshots

To restore, you must ensure that the index is not open. This state is normally the case if you are reloading old data into the same cluster. If the index is open, delete the existing index first, then restore your backup.

When you restore, you specify the snapshot, and also any indexes that are in the snapshot. You do this in much the same way as when you took the snapshot in the first place.

The following example restores just one index, alb-accesslog-2018–08–28, from 2019-01–19 in the my-snapshot-repo snapshot repository:

curl -XPOST 'https://search-mb-production-app.us-west-2.es.amazonaws.com/_snapshot/my-snapshot-repo/2019-01–19/_restore' -d'{
"indices": "alb-accesslog-2018-08-28",
"ignore_unavailable": false,
"include_global_state": false
}' -H 'Content-Type: application/json'
# To restore the entire snapshot
curl -XDELETE 'https://search-mb-production-app.us-west-2.es.amazonaws.com/_all'

Deleting Indices

If you don’t plan to restore all indices, though, you might want to delete only one:

curl -XDELETE 'https://search-mb-production-app.us-west-2.es.amazonaws.com/index-name'

The following example shows how to delete all existing indices for a domain:

curl -XDELETE 'https://ssearch-mb-production-app.us-west-2.es.amazonaws.com/_all'

Please drop in your comments and thoughts on the article. To know more about DocsApp check out our website, and if you like what we are doing and want to join us — feel free to write in to us at careers@docsapp.in

--

--