[Kubernetize] Upgrade to Elasticsearch+Kibana 6

Since the release of Elasticsearch 5.0 back at the end of 2016, we were all excited about its new features. The new tempting Kibana UI made us think like “Yeah! we are going to upgrade it very soon!”.

And then, we kept doing our day-to-day operations, forgetting about the Elasticsearch and Kibana.

Until the Elasticsearch 6.0 had been released.

And that’s when we realized, “Oh it’s too late.” Typically, the Elasticsearch only support back only 1 version. Now the upgrade task is harder than ever. We need to upgrade using the so-called Reindex upgrade.

It’s even more difficult if your stack hosted on a Kubernetes cluster. The upgrade guide provided in Elastic website mainly mention upgrading on a server, not Docker containers or even Kubernetes.

In this article, It’ll guide you how to upgrade your Kubernetes-hosted Elasticsearch from 2.x to 6.x along with the steps I have performed to upgrade my real Kubernetes production cluster on Google Cloud Platform.


The Existing Cluster

I use the Elastic cluster mainly to collect server logs. However, I believe this process should work on any Kubernetes-hosted Elasticsearch cluster.

The existing cluster was set up in a minimum basic Kubernetes object; it looks like this:

Elasticsearch 2 cluster on Kubernetes

The Kibana that connects to this Elastic cluster also looks minimalist;

Kibana 4 on the same cluster

It’s a pretty minimalist cluster, but I’m pretty sure this approach for upgrading will work for a larger cluster as well.


The Plan

To keep the upgrade operation simple and fast, I plan to take the Reindex approach. Taking advantage of Kuberentes and Docker, it’s easier to deploy a newer version cluster than to upgrade the existing one.

So let’s examine the critical point about Reindexing Elasticsearch:

  • The Reindex, in Elasticsearch, is just a way to call an action where you move data from index A to index B.
  • By upgrading a cluster using the Reindex approach, it means we deploy a new cluster and move data from the old cluster to the new one.
  • For that to perfectly work, we must prepare the new cluster to have a compatible index.
  • That implies we must inspect the mapping to see if any changes of the new version break our mapping.

In short, to make the Reindex works, you must check whether your index mapping will be compatible with the new version of Elasticsearch. An index transformation might be required to prepare the new cluster.

After that, we only need to upgrade and reindex the Elasticsearch.

Here are the steps we should take for this upgrade:

  1. Back up the existing cluster.
  2. Obtain a new version of index mapping of your cluster.
  3. Deploy the new version of Elasticsearch Cluster.
  4. Deploy Kibana.
  5. Create all indices with the mapping obtained from step 1.
  6. Run Reindex in the new cluster.
  7. Take down the old cluster.

With these simple steps, your new cluster will be ready in no time!


Back up the existing cluster

This guide used a Snapshot approach to perform a full cluster backup. The step shows only one approach on how to backup an Elasticsearch cluster. Your cluster might need different approaches or details depending on where the cluster is deployed.

My cluster is on the Google Cloud Platform. So I will set up a GCE disk, make a volume out of it, and get Elasticsearch to create snapshot into that volume.

Creating GCE Disk

$ gcloud compute disks create elastic-logger-backup --size 20 --type pd-ssd --zone {your-cluster-zone}

Also the persistent volume and volume claim

---
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: elastic-logger-backup
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 20Gi
storageClassName: fast
selector:
matchLabels:
db: "elastic-logger-backup"
---
apiVersion: "v1"
kind: "PersistentVolume"
metadata:
name: elastic-logger-backup
labels:
db: "elastic-logger-backup"
spec:
capacity:
storage: "20Gi"
accessModes:
- "ReadWriteOnce"
storageClassName: "fast"
gcePersistentDisk:
fsType: "ext4"
pdName: "elastic-logger-backup"

Upgrade the existing Elasticsearch to mount this backup volume and mount the config that point to the backup path.

upgrade your cluster to have snapshot backup

With this, the Elasticsearch pods mount the volume logging-elastic-backup at /mount/backups. The Configmap make the backup path available in elasticsearch.yaml , which is the main config file for the Elasticsearch.

Because my cluster is set up using a StatefulSets, to perform the rolling upgrade, the pods must be deleted.

$ kubectl delete pods -l app=logging-elastic

Note: you can change app=logging-elastic depending on your StatefulSets label.

Then, inside the Elasticsearch pods, run mkdir to ensure a backup folder exists.

$ kubectl exec logging-elastic-0 "mkdir /mount/backups/backup_1; chmod 777 /mount/backups/backup_1"

If you encounter an error in this step, you can use exec to get into the pods to manually run mkdir and chmod.

I use Kubernetes port-forward to connect the local HTTP to the pods.

$ kubectl port-forward logging-elastic-0 9201:9200

We then can create a snapshot repository

$ curl -X PUT \
http://localhost:9201/_snapshot/backup_1/ \
-H 'cache-control: no-cache' \
-H 'content-type: application/json' \
-d '{
"type": "fs",
"settings": {
"location": "/mount/backups/backup_1",
"compress": true
}
}'

After this, we initiate the snapshot backup

$ curl -X PUT 'http://localhost:9201/_snapshot/backup_1/snapshot_1?wait_for_completion=true'

It may take a while depending on the amount of your data. When it’s done, we can check if the snapshot had created entirely

$ curl -XGET 'http://localhost:9201/_snapshot/backup_1/_all'

Now, our data is safe in the backup

Obtain a new version of index mapping of your cluster

Getting a new version of index mapping is the most crucial part of the upgrading process because the mapping is the identity of each index. To successfully Reindex data to a new cluster, the mapping must be complete and compatible.

In the abstract, all we need is to take the existing index, transform it into a new one compatible to the upgrading version.

In practice, this is a cluster-specific problem, you should check those changes on your cluster. This page might help you with the process.

The approach I use is to export the mapping from the existing cluster. Create a local docker with the new Elasticsearch version (using the official repo). Then try to create an Index with the current mapping. For the syntax that does not support, the index API throws an error. I googled those error and adjusted my mapping syntax accordingly.

A one breaking changes you are likely to face is this one:

email: {
- type: 'string',
- index: 'not_analyzed'
+ type: 'keyword',
+ index: true
}

The index not_analyzed had been depreciated and changed other partstype: keyword.

At the end of this step, you should have a complete working index mapping in hand. This mapping will be used with the new version cluster later.

Deploy the new version of Elasticsearch Cluster

Before performing any upgrade deployment, if the cluster is in the production environment, make sure you disable the traffic to the cluster to avoid data inconsistency. Further planning should be addressed if you plan to upgrade with zero downtime.

For the new Elasticsearch 6 cluster, I use Bayu Aldi Yansyah’s guide:

With some adjustment for my Kubernetes cluster:

  • I do not want any changes in term of connection to other component dependencies. I want the upgrade to be unnoticed from the outside. So I replace the old cluster’s service with another Kubernetes service connecting to the new cluster.
  • I then open a new service connecting to the old cluster. It allows me to perform the reindex by exposing the old cluster port to the new.
  • I use Dynamic Volume Provisioning to provision volume of each node of the Elasticsearch. It easy to set it through GCE but you can set up your own provisioning configuration on other environments.

First, we deploy a service connecting to the old cluster:

Add this service for connection to the old cluster

My new cluster looks like this:

The new cluster would look like this
  • Notice the service name; it uses logging-elastic-service same as the previous cluster. This will make the connection directs to the new elastic cluster without having to change other parts of your Kuberntes cluster.
  • The initial setup only set for 2 nodes for the new cluster. You can change that by configuring the discovery.zen.ping.unicast.hosts configs if you require more nodes to the cluster.
  • reindex.remote.whitelist, this config exposes the old cluster so the new cluster can perform the reindex from the old cluster.
  • The Kubernetes configuration such as resources.requests or resources.limits and volume storage can be changed to match your cluster needs.

The new cluster should be ready by now. Use this method to check your cluster health.

Deploy Kibana

The Kibana version 6 is also available to use alongside Elasticsearch 6.

First, you can remove your old Kibana deployment from your Kubernetes cluster.

In my case, I only run:

$ kubectl delete -f kibana-4.yaml

And it’s gone for good. Don’t worry; your Kibana saved search and visualization are safe in the Elasticsearch.

Deploy the Kibana 6 with the config that connects it to the new cluster.

The Kibana version 6

The only config needed is the ELASTICSEARCH_URL environment variable which should point to your new Elasticsearch cluster service.

Now, your Kibana should be up

New Kibana UI

Set up some initial setting for the Kibana. Maybe start with saving the default index pattern.

Firing

GET /_cat/indices

To the new cluster should return something like this

green open .kibana Tn5PeoxAFdCEmfYW9Dd6Q 1 1 65  3 528.9kb 264.4kb

And it’s done, your new cluster is ready now.

Create all indices with the mapping obtained from step 1

The transformed mapping obtained from step 1 should now be inserted into your new cluster. This can be done manually by creating an index with mapping.

At my upgrade, I use the exact index name for each index. For your cluster, feel free to change any name as you wish. However, evolving index name will affect how you perform the reindexing.

In my case, I have several indices with the same mapping, so I write a simple shell script which automatically does the tasks for me. If your cluster is alike, feel free to copy my code and use it.

First, I port-forward my local machine to the new cluster and the old one so my script would have access to both cluster

$ kubectl port-forward logging-elastic-cluster-0 9202:9200

new cluster goes to port 9202.

$ kubectl port-forward logging-elastic-0 9201:9200

And my old cluster goes to port 9201.

Use this script to generate multiple indices at once

The script gets every relevant index from the old cluster using the regex. In my case, my index prefixed log. One point to pay attention to is that this step should skip the Kibana index. By default, the Kibana index is named .kibana, make sure you skip it.

Now, check the new cluster if all indices are ready.

$ curl localhost:9202/_cat/indices

Run Reindex in the new cluster

Now, all indices are ready at both clusters. We need to run Remote Reindex for each index. At the reindexing process, Elasticsearch gets data from source index and putting them in dest index.

The reindex API should look like this

POST _reindex
{
"source": {
"remote": {
"host": "http://<<URL_TO_YOUR_OLD_CLUSTER_SERVICE>>",
"username": "user<<IF_ANY>>",
"password": "pass<<IF_ANY>>",
"socket_timeout": "10m",
"connect_timeout": "1m"
},
"index": "SOURCE_INDEX_NAME"
},
"dest": {
"index": "dest"
}
}

I put in the socket_timeout and connect_timeout to avoid network error. The reindexing for each index might take up to 5 minutes depending on how large your data is.

The URL_TO_YOUR_OLD_CLUSTER_SERVICE should be the URL of your service. In the example above, we should use logging-elastic-service-old:9200.

Again, in my case, I ran the script to fire this API for each index.

Use this script to run reindex on all indices

Viola! Your data should moved to the new cluster successfully.

Now, for those who rely on Kibana, you might need to move the Kibana index to the new cluster.

While you deploy Kibana, you might get to play a bit with Kibana dashboard; this will make Kibana create its internal index in Elasticsearch. By default, this show as .kibana index.

We will reindex those as well with some index adjusting script

curl -X POST \
http://localhost:9202/_reindex \
-H 'cache-control: no-cache' \
-H 'content-type: application/json' \
-H 'postman-token: f7c3f644-e8bd-8ceb-dd58-fffabfc011eb' \
-d '{
"source": {
"remote": {
"host": "http://logging-elastic-service-old:9200"
},
"index": ".kibana"
},
"dest": {
"index": ".kibana"
},
"script": {
"inline": "ctx._source = [ ctx._type : ctx._source ]; ctx._source.type = ctx._type; ctx._id = ctx._type + \":\" + ctx._id; ctx._type = \"doc\"; ",
"lang": "painless"
}
}'

This is the method I hacked from the official migration guide. If your configuration differs, feel free to follow the guide and adjust it accordingly.

Take down the old cluster

After checking everything is OK, we can take down the old Elasticsearch cluster along with the service we created.

Enjoy your new cluster!

Conclusion

Alright! At this point, we should have a lovely newly updated Elasticsearch version running on your Kubernetes cluster. I hope this article would be a useful guide for those who wanted to upgrade the Elasticsearch.

If you guys have any suggestions regarding to this article, feel free to comment down below.

Thanks!