Migrating an Elasticsearch cluster with 0 downtime

Flavien Berwick
6 min readFeb 8, 2023
Birds migration. Photo by Alfredinix.

As an SRE/DevOps engineer, I sometimes get to migrate large Elasticsearch clusters. Today, downtime is no longer acceptable. My clients must be able to work on their infrastructure in parallel with administrative actions.

I recently migrated a large Elasticsearch logs cluster on new hardware. The challenge was to have no service disruption so my client would not lose any logs, that was still continuously sent from its data providers. Different migration strategies exist, and this post will present one optimizing migration time.

A general recommendation to avoid downtime: your data providers (Filebeat, Logstash, or any app) should be configured to point to a single domain name (ex: elasticsearch.mycompany.com).

By configuring a domain name, one only has to update the IP of the current cluster in the DNS to the migrated one. This leaves providers’ configurations untouched.

If you use an IP instead of a domain name, you’ll still need to update your provider configuration and restart it. There’s no magic.

This post is complemented by a companion GitHub repository which is a convenient way to appropriate the tutorial: github.com/flavienbwk/elasticsearch-migration-tutorial.

The plan

We’ll use the “extended cluster with rolling node termination”, which is helpful in an environment where providers continue to write in your cluster.

Environment: providers ingest data to Cluster A (3 nodes). Cluster A will migrate to Cluster B (3 nodes) through shards relocation.

A recommendation before starting: this method puts a certain amount of pressure on your cluster, as it requires shards’ relocation and incoming data going from Cluster A to cluster B. Prefer doing it off business hours. If you can’t afford this, use the Logstash approach.

Pre-requisites: Node B1 must be able to reach A1. No need to expose all nodes between themselves. B1 and A1 must share the same authentication credentials.

  1. Benchmark our target cluster (Cluster B) with Rally
  2. Start Cluster B and progressively make nodes join Cluster A (through A1)
  3. Increase the index refresh interval and define shards relocation limits (concurrency and rate)
  4. Progressively migrate by excluding Cluster A nodes, but A1
  5. Change providers’ URIs to point to Cluster B (or domain name endpoint)
  6. Exclude A1
  7. Monitor relocating shards to ensure it is completed and shutdown Cluster A
  8. Reset parameters of step 3

If you follow this tutorial along with the GitHub repository, set up your Cluster A with some data inside (follow README file at the “1. Start and configure the cluster” chapter).

Benchmarking your cluster

To avoid putting too much pressure on your target cluster (Cluster B), we’re first going to benchmark the amount of data it is able to handle. Go through step 2 of “Setting up Cluster B.

We do that with Rally, a benchmarking framework from and for Elasticsearch.

docker run \
--name benchmark \
--rm \
--network elasticsearch_migration_network \
elastic/rally:latest \
race --track=metricbeat --pipeline=benchmark-only \
--include-tasks="index-append" \
--challenge=append-no-conflicts \
--client-options=timeout:30,use_ssl:true,verify_certs:false,basic_auth_user:'elastic',basic_auth_password:'changeme' \
--target-hosts=https://esB1:9200

With the calculus explained in the repo, I reach a capacity of 17.57 MB/s on my machine.

Migrating data

With this “expanding then contracting cluster” technique, Cluster B must be blank. If you’ve already started your Cluster B and can’t remove it, prefer using the Logstash technique.

/1/: Let’s first configure variables for Cluster A and B as we’re querying them through Elasticsearch’s API :

export CURL_PRMS_CLUSTERA='--insecure -u elastic:changeme https://172.17.0.1:9200' # links to esA1
export CURL_PRMS_CLUSTERB='--insecure -u elastic:changeme https://172.17.0.1:9201' # links to esB1

/2/: We’re now stopping shard rebalancing in Cluster A, so our cluster doesn’t move shards across nodes while we’re migrating (step 1.2 of repo)

curl -X PUT $CURL_PRMS_CLUSTERA/_cluster/settings -H 'Content-Type: application/json' -d'
{
"transient": {
"cluster.routing.rebalance.enable": "none"
}
}'

As per the recommendations of the official Elasticsearch documentation, increase the refresh interval for an initial load :

curl -X PUT $CURL_PRMS_CLUSTERB/_settings -H 'Content-Type: application/json' -d'
{
"refresh_interval": "30s"
}'

/3/: Update Cluster B nodes configurations pair to Cluster A (step 3.3 of repo)

Cluster B nodes are going to be paired to Cluster A through B1. It is sufficient that B1 is the only one being able to reach Cluster A (in our case, “esA1”). Once registered, nodes are aware of the presence of all other nodes and thus can communicate with each other.

# In Cluster B nodes elasticsearch.yml configuration
cluster.name: mycluster
discovery.seed_hosts: esB1,esB2,esB3,esA1
cluster.initial_master_nodes: esA1

/4/: Start Cluster B and make nodes join Cluster A (step 3.4 of repo)

We’re now making Cluster B nodes progressively join Cluster A. At this step while following the repo, you should get your 6 nodes in the same cluster.

Cluster B nodes have joined Cluster A.

/5/: Setting up migration limits (step 3.5 of repo)

This is where we take advantage of our benchmark. We will edit the default values of concurrent shard recoveries and limit inbound/outbound node recovery traffic.

curl -X PUT $CURL_PRMS_CLUSTERA/_cluster/settings -H 'Content-Type: application/json' -d'
{
"transient": {
"cluster.routing.allocation.node_concurrent_recoveries": "20",
"indices.recovery.max_bytes_per_sec": "17mb"
}
}'

/6/: Progressively migrate and shutdown nodes of Cluster A (step 3.6 of repo)

A1 is the gateway to all nodes in Cluster A. So it must be shut down last.

For each node, exclude its IP address from the cluster. Let’s take the example of esA2 for this step. You can retrieve the IP of a node with this command :

curl -sX GET $CURL_PRMS_CLUSTERB/_nodes/esA2 | jq '.nodes | to_entries | .[0].value.ip'

We are now ready to perform the migration. The following command will cause shards reallocation :

curl -X PUT $CURL_PRMS_CLUSTERB/_cluster/settings -H 'Content-Type: application/json' -d'
{
"transient": {
"cluster.routing.allocation.exclude._ip": "172.17.0.4"
}
}'

Check the number of active shards for this Node (check it reaches 0) :

curl -sX GET $CURL_PRMS_CLUSTERB/_cat/allocation/esA2 | cut -d' ' -f1

Shut down the node. Repeat this step for each node to be excluded.

/7/: Make now sure the current number of nodes in the cluster is 3

curl -sX GET $CURL_PRMS_CLUSTERB'/_cat/nodes?format=json&filter_path=ip,name' | jq length
All Cluster A nodes are migrated and shut down.

/8/: Reset configuration for Cluster B use only (step 3.7 of repo)

Remove any mention of “esA1” in the cluster.initial_master_nodes and discovery.seed_hosts parameters in your elasticsearch.yml config files.

Reset cluster routing allocation and shard balancing parameters :

curl -X PUT $CURL_PRMS_CLUSTERB/_cluster/settings -H 'Content-Type: application/json' -d'
{
"transient": {
"cluster.routing.*": null,
"indices.recovery.max_bytes_per_sec": null
}
}'

Decrease refresh interval to default :

curl -X PUT $CURL_PRMS_CLUSTERB/_settings -H 'Content-Type: application/json' -d'
{
"refresh_interval": "1s"
}'

Use cases

Pros and cons of the “expand then contract cluster” approach compared to the traditional Logstash (full copy/paste) approach.

The problem lies in the time between when you edit provider’s IP to point to Cluster B, and when data are entirely migrated. With the Logstash approach, users could still write data during the time you switch the configuration from Cluster A to B. On the other hand, if you switch the configuration before having all data migrated, users will get presented with data including missing history.

Pros :

  • True 0-downtime approach: allowing users to continue working with data (i.e: watch their dashboards with live data, consume historical data in apps) without temporarily missing data.
  • No data leftover: in a Logstash approach, if providers still write remains of data in Cluster A indices after Logstash went on them, these indices would require a second pass, including deduplication, which can increase the migration time and cost.
  • More granular migration: when having infrastructure cost constraints (when you can’t afford to double the infrastructure of Cluster A for creating Cluster B), it allows you to migrate machines that were on Cluster A progressively, shut them down, and re-instantiate some to be used in Cluster B.
  • Transfer is quick (although it would require deeper comparison benchmarks to assert it thoroughly, and the Logstash approach can offer disabling replicas for initial load)

Cons :

  • Requires more preparation work
  • Increases latency during migration (due to shards getting migrated)

Prefer the Logstash approach if you don’t have true requirements for using (read or write) live data during the migration process. You will avoid a lot of manual work (and thus prevent many human errors).

This may be useful to use the “contract then expand” approach in the case of a production search engine used by developers and clients. But less in the case of a cluster used for logs and monitoring data.

Always do it off business hours to avoid pressure on the system.

That’s it! Thank you for reading this tutorial.

--

--