Elasticsearch 7.x Backup — “Snapshot & Restore” a cluster on AWS S3
Set up snapshot and restore functionalities on Elasticsearch.

In 2016 I wrote an Article about Elasticsearch Backup, it had and still has quite good interests from people. I decided to start a new series of articles with the Backup topic as the main argument.
The old article covered Snapshot & Restore functionalities based on Elasticsearch 2.4.x and the upcoming version, the 5.0. As it was 4 years ago I choose to refresh this tutorial and making it the first of a series of more.
I will prepare a small article on how to use the snapshot & restore functionality with different cloud-provider. This article is based on Elasticsearch 7.x, it doesn’t mean it couldn’t work on older versions but I focused on the latest one.
Elasticsearch Snapshot & Restore
Elasticsearch has a smart solution to backup single indices or entire clusters to remote shared filesystem or S3 or HDFS. The snapshot ES creates does not so resource consuming and is relatively small.
The idea behind these snapshots is that they are not “archive” in a strict sense, these snapshots can only be read by a version of Elasticsearch that is capable to read the index version stored inside the snapshot.
So you can follow this quick scheme if you want to restore ES snapshots :
- A snapshot of an index created in 6.x can be restored to 7.x.
- A snapshot of an index created in 5.x can be restored to 6.x.
- A snapshot of an index created in 2.x can be restored to 5.x.
- A snapshot of an index created in 1.x can be restored to 2.x.
Snapshots of indices created with ES 1.x cannot be restored to 5.x or 6.x, snapshots of indices created in 2.x cannot be restored to 6.x or 7.x, and snapshots of indices created in 5.x cannot be restored to 7.x or 8.x.
So pay a lot of attention when you create a snapshot from 1.x, you cannot restore directly from a 5.x,6.x, 7.x. You should follow the previous table and then you can use the cluster you should import with a 2.x cluster and then you can use reindex-from-remote available since the 5.x.
To start backing up indices you must know the syllabus behind it :
- repository: the repository is just a logical aggregator inside which you will store real backup data (snapshot). a repository can contain multiple snapshots.
- snapshot: is the backup of the data
- restore: is the ability to recover a back up from its repository
Elasticsearch Snapshots
The data backed up is stored in a structure called: SNAPSHOT. Snapshots are incremental which means they are not part of the last snapshot. The incremental nature of snapshots allows making them frequently without creating a lot of overhead.

Elasticsearch Repository
Every backup inside Elasticsearch is stored inside a so-called “snapshot repository” which is a container defined to setup the filesystem or the virtual filesystem features the snapshots will be stored in. When you create a repository you have many options available to define it. You can define a repository with a :
- Shared filesystem
- AWS S3
- Hadoop HDFS
- Microsoft Azure
- Google Cloud Platform
In this tutorial, we will discover AWS S3 as a repository to store our snapshots.

Elasticsearch S3 plugin
Since my last article the AWS-cloud plugin has been split into two different plugins:
- S3 snapshot plugin (useful for the sake of this tutorial)
- EC2 plugin: provides a list of seed addresses to the discovery process by querying the AWS API for a list of EC2 instances matching certain criteria determined by the plugin settings.
This plugin during the time has changed a lot and has been improved. We will try to analyze it in detail and dig into every possible configuration.
How to install the plugin
Since my last article, the plugin has been improved and changed a lot. Elastic move from cloud-aws plugin to a new set of repository plugin which abstracts the filesystem. You have a filesystem plugin which I suggest to use only for test purposes, then you have many other solutions; here we will look at the S3 plugin.
The plugin installation is quite easy just a quick simple command executed in a terminal window:
sudo bin/elasticsearch-plugin install repository-s3

The Elasticsearch plugin application has 3 commands :
- install
- remove
- list
Remember to stop Elasticsearch before installing/removing plugins. Now the plugin is installed before you set up the repository and then starting doing a snapshot of the indices you have to analyze two different strategies to access Amazon S3.
Configure S3 Plugin
The repository-S3 plugin provides a repository type named S3, which may be used when creating a repository. The repository defaults to using ECS IAM Role or EC2 IAM Role credentials for authentication or you can use user credentials.
The only mandatory setting is the bucket name:
curl -X PUT "localhost:9200/_snapshot/my_s3_repository?pretty" -H 'Content-Type: application/json' -d'
{
"type": "s3",
"settings": {
"bucket": "my_bucket"
}
}
'
Before creating the repository you have to be sure to have setup correctly the client settings, which allows the repository to store the data on the right filesystem: in our case AWS S3.
Client Settings
Again, since my last article things improved a lot and a new concept has been created from ES 5.5: client configuration. A client exposes the configurations used to connect to an external system for backup purposes, in our case S3.
By default, there’s a client called default
and you can access its properties with this form s3.client.CLIENT_NAME.SETTING_NAME
; using the default one access to the properties would be something like this:
s3.client.default.max_retries
s3.client.default.protocol
s3.client.default.endpoint
....
The client settings should be specified in the elasticsearch.yml, all the configs except the secure settings: access_key and secret_key, for them you have to use Elasticsearch key store.
For example, our sample file looks like this (at the end of the article there’s a link to a GitHub repository with a full sample):
cluster.name: "docker-cluster"
network.host: 0.0.0.0s3.client.default.endpoint: s3-eu-west-1.amazonaws.com
This above is just a simple configuration, for example only, but the minimum config required for the client is setting: endpoint, access_key, secret_key settings.
If you want to define a custom client with a different name, or you need to define more than one, you specify it in the creation phase of the repository:
curl -X PUT "localhost:9200/_snapshot/my_s3_repository?pretty" -H 'Content-Type: application/json' -d'
{
"type": "s3",
"settings": {
"bucket": "my_bucket",
"client": "my_alternate_client"
}
}
'
and then set custom client settings this way:
s3.client.my_alternate_name.max_retries
s3.client.my_alternate_name.protocol
s3.client.my_alternate_name.endpoint
If instead, you want to use the instance role or container role to access S3 then you should leave these settings unset.
If you have set up an instance role or a container role to access S3 you can leave these settings unchanged, as they are by default.

AWS S3 repository Authentication
The article is about to create a snapshot on a repository which, in our specific situation, is AWS S3. How could it be possible to access S3? How should we allow Elasticsearch to access AWS S3?
There are many possibilities and it depends on your needs:
- You can use AWS Instance Role
- You can use AWS Container Role
- You can use Elasticsearch key store
In both cases, you have to implement a custom policy or use a default one already defined into your AWS account.
By default, AWS exposes a policy called AmazonS3FullAccess
which is an easy way to attach the policy to a Role or a User but the downside is that you are attaching a full access policy when you just need a small subset. My suggestion is to stay strict with permissions as much as you can.
IAM Roles
The best way you can grant access to S3 to save Elasticsearch snapshot is by implementing an IAM Role. Through the IAM Roles, you can define a policy or a set of policies attached to it and then grant permission to entities you trust.
A very good example would be to create an IAM ROLE and attach to it the S3 policies you need (in our case a custom policy with the minimum actions needed for allowing Elasticsearch to manage snapshots).
There’s a good explanation of this on this official blog post of AWS. From that blog post:
IAM roles enable your applications running on Amazon EC2 to use temporary security credentials. IAM roles for EC2 make it easier for your applications to make API requests securely from an instance because they do not require you to manage AWS security credentials that the applications use.
Elastic Key store
In the last couple of Elasticsearch major versions have been released a CLI tool called elasticsearch-keystore (it’s in the /bin directory). The key-store it’s node-specific, so you have to apply the config on every node of the cluster.
This tool is really powerful and allows you to manage in a real safe way all the security objects you need to work with while you are setting up the service.
Let’s take an example, for the sake of this tutorial we need to use AWS IAM key, and secret key.
bin/elasticsearch-keystore add s3.client.default.access_key
bin/elasticsearch-keystore add s3.client.default.secret_key
The tool has a lot of options you can use, in this case instead of applying the AWS secrets on every node, I preferred saving these values in the Dockerfile.
#Dockerfile
...
...
RUN echo ".." | bin/elasticsearch-keystore add --stdin --force s3.client.default.access_keyRUN echo "..." | bin/elasticsearch-keystore add --stdin --force s3.client.default.secret_key
A custom AWS IAM Policy
I won’t explain in this tutorial what is an AWS IAM Policy and how it works as it is not in the scope of it. If you want to understand better you can follow the AWS Policies and Permission documentation.
I will show you how to define a custom policy with only the action needed for enabling the snapshot to work.
In the AWS Console go to the AWS Security credentials and access the “Policies” menu item.

You will find a lot of system policies by default defined, with which you can play with, again my suggestion is to define a more granular policy only for the snapshot thing we are dealing right now.
Create a new Policy, you can use the visual editor or enter directly the JSON in the editor.

Now it follows the JSON I use to grant access to the AWS S3 Bucket I previously created.
This is the gist link https://gist.github.com/p365labs/1542d6382e21ad5b4cdf1b82ef12d0fc
In the Resource field, you could specify, for a more granular definition, the bucket name you are allowing ES to access for the snapshot purpose.
Let’s take our snapshot
Before start playing with Elasticsearch you should make sure you have created an AWS S3 bucket, with public access turned off. The bucket name for this example is: es2s3

Again make sure, on bucket settings, to block all public permissions. Just simply click on the bucket name and access the Permissions tab. Make sure everything has “block public access” ON.

Build the containers the first time.
docker-compose build
then run the containers
docker-compose up
now if you check the Elasticsearch version you will see something like this
curl localhost:9200
and this will be the response:
{
"name" : "es01",
"cluster_name" : "es-docker-cluster",
"cluster_uuid" : "A9AICCuxTi2lITqr2OJS2w",
"version" : {
"number" : "7.6.2",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "ef48eb35cf30adf4db14086e8aabd07ef6fb113f",
"build_date" : "2020-03-26T06:34:37.794943Z",
"build_snapshot" : false,
"lucene_version" : "8.4.0",
"minimum_wire_compatibility_version" : "6.8.0",
"minimum_index_compatibility_version" : "6.0.0-beta1"
},
"tagline" : "You Know, for Search"
}
if you don’t see errors in the logs, the client configured will be able to access the AWS S3 bucket. Now you can set up a repository where you can do your snapshots.
curl -X PUT "localhost:9200/_snapshot/mycustom_s3_repo?pretty" -H 'Content-Type: application/json' -d'
{
"type": "s3",
"settings": {
"bucket": "es2s3"
}
}
'
If everything goes well this will be the response:
{
"acknowledged" : true
}
Now you are ready “snapshotting” your Elasticsearch indices; check out the repository configuration:
curl "localhost:9200/_snapshot?pretty"
and this should be the response:
{
"mycustom_s3_repo" : {
"type" : "s3",
"settings" : {
"bucket" : "es2s3"
}
}
}
Next step is trying to create a snapshot with the right curl:
curl -X PUT "localhost:9200/_snapshot/mycustom_s3_repo/snapshot_1?wait_for_completion=true&pretty"
With the wait_for_completion parameter, the request wait until the snapshot completes, if you set it to false it will return immediately the response.
This is the response after creating the snapshot:
{
"snapshot" : {
"snapshot" : "snapshot_1",
"uuid" : "WiNVFShuRzmNBmnkxoC20A",
"version_id" : 7060299,
"version" : "7.6.2",
"indices" : [ ],
"include_global_state" : true,
"state" : "SUCCESS",
"start_time" : "2020-05-30T11:44:43.972Z",
"start_time_in_millis" : 1590839083972,
"end_time" : "2020-05-30T11:44:44.173Z",
"end_time_in_millis" : 1590839084173,
"duration_in_millis" : 201,
"failures" : [ ],
"shards" : {
"total" : 0,
"failed" : 0,
"successful" : 0
}
}
}
The message tells us the snapshot has been done successfully, you can also test on S3 what there’s inside the bucket. There are some files… have a look:

Ok but.. we have no indices … so it’s pretty easy :) Now let’s add some data to a simple index and let’s discover how to restore the indices:
curl -X POST "localhost:9200/person/_bulk?pretty" -H 'Content-Type: application/json' -d'
{ "index":{} }
{ "name":"john doe","age":25 }
{ "index":{} }
{ "name":"mary smith","age":32 }
{ "index":{} }
{ "name":"robin green","age":15 }
{ "index":{} }
{ "name":"fred white","age":68 }
'
This will create a simple index, person, with just 4 documents. Let’s do a new snapshot and see what’s happening:
curl -X PUT "localhost:9200/_snapshot/mycustom_s3_repo/snapshot_2?wait_for_completion=true&pretty"
Now to validate the idea of having a backup system in place let’s delete the index and restore it.
curl -X DELETE localhost:9200/personthen execute acurl localhost:9200/_cat/indices
The cluster right now will be empty!
curl -X POST "localhost:9200/_snapshot/mycustom_s3_repo/snapshot_2/_restore?pretty"
If now you try to look for the indices and make a query you should have a meaningful result
curl localhost:9200/_cat/indices

curl "localhost:9200/person/_search?pretty&q=john"with this result{
"took" : 31,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 1,
"relation" : "eq"
},
"max_score" : 1.2039728,
"hits" : [
{
"_index" : "person",
"_type" : "_doc",
"_id" : "baNzZXIBB04z4g6GU0tF",
"_score" : 1.2039728,
"_source" : {
"name" : "john doe",
"age" : 25
}
}
]
}
}
WOW! it works. The complex part is not the snapshot and the restoring thing is about understanding what repository, snapshot, and clients are; and how to configure them. In the next chapter, you will find a link to a GitHub repository where you can find the configuration files for starting a cluster on your own and make some tests.
I’m still preparing a new set of articles related to snapshot and restoring management and how to deal with snapshot orchestration… so stay tuned.
Example Repository
I have prepared a repository on GitHub you can use as an example to start implementing your own Elasticsearch snapshot.
Here you can find 3 simple files which will allow, with the help of docker-compose, to create an Elasticsearch cluster and set it up correctly the repository and the client for you.
- Dockerfile
FROM docker.elastic.co/elasticsearch/elasticsearch:7.6.2RUN /usr/share/elasticsearch/bin/elasticsearch-plugin install — batch repository-s3COPY — chown=elasticsearch:elasticsearch elasticsearch.yml /usr/share/elasticsearch/config/RUN echo “YOUR_ACCESS_KEY” | bin/elasticsearch-keystore add — stdin — force s3.client.default.access_key
RUN echo “YOUR_SECRET_KEY” | bin/elasticsearch-keystore add — stdin — force s3.client.default.secret_key
It defines which Elasticsearch version you will use, it copies the elasticsearch.yml configuration file into the container, and before the cluster starts it will add, using the CLI command elasticsearch-keystore, the AWS access_key, and secret_key to allow Elasticsearch backing it up the indices in an AWS S3 bucket.
- docker-compose-yml
version: '2.2'
services:
es01:
build: .
container_name: es01
environment:
- node.name=es01
- cluster.name=es-docker-cluster
- discovery.seed_hosts=es02,es03
- cluster.initial_master_nodes=es01,es02,es03
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ulimits:
memlock:
....
....
....
It is the docker-compose file which creates the Elasticsearch cluster. If you want more information on it and how you can tweak Elasticsearch docker-compose file please have a look at Elasticsearch docker official documentation.
More than this Elastic is providing a full list of available Docker images.
- elasticsearch.yml
cluster.name: "docker-cluster"
network.host: 0.0.0.0s3.client.default.endpoint: s3-eu-west-1.amazonaws.com
This is the Elasticsearch configuration file. In our example with are adding to it only one information which is the AWS region we are working with.
Resources
- https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-lifecycle-management.html
- https://www.elastic.co/guide/en/cloud/current/ec-snapshot-restore.html
- https://support.cloudbees.com/hc/en-us/articles/115000592472-Managing-snapshots-of-your-Elasticsearch-indices-
- https://www.elastic.co/blog/found-dive-into-elasticsearch-storage#storing-snapshots
- https://bitsofinfo.wordpress.com/2015/05/29/aggregate-backup-elasticsearch-fs-snapshots-across-a-widely-distributed-cluster/
- https://www.elastic.co/blog/found-elasticsearch-snapshot-and-restore
- curl http://localhost:9200/_nodes?filter_path=nodes.*.plugins curl for plugin list
- https://opendistro.github.io/for-elasticsearch-docs/docs/elasticsearch/snapshot-restore/
- https://www.elastic.co/guide/en/elasticsearch/reference/current/secure-settings.html
- https://www.elastic.co/guide/en/elasticsearch/reference/current/elasticsearch-keystore.html
- https://www.docker.elastic.co/
- https://www.elastic.co/guide/en/elasticsearch/reference/7.8/docker.html
- https://aws.amazon.com/blogs/security/easily-replace-or-attach-an-iam-role-to-an-existing-ec2-instance-by-using-the-ec2-console/