When everything else fails

4 min readJun 27, 2017

We are using Elasticsearch on a Google Compute Engine kubernetes cluster. The cluster consists of 3 data nodes, 2 master nodes and 2 client nodes. This night for some reason two of the three data nodes rebooted, causing one of the indices to turn red.

$ kubectl --namespace escluster get pods
NAME                         READY     STATUS    RESTARTS   AGE
es-client-1098028550-0lbj4   1/1       Running   1          24d
es-client-1098028550-s02np   1/1       Running   0          4h
es-data-0                    1/1       Running   163        24d
es-data-1                    1/1       Running   0          4h
es-data-2                    1/1       Running   0          4h
es-master-1414048425-7wpwz   1/1       Running   0          4h
es-master-1414048425-q8x1x   1/1       Running   1          24d
es-master-1414048425-xbffw   1/1       Running   0          4h

Using port forwarding we can communicate with the cluster as if the cluster is locally.

$ kubectl --namespace escluster port-forward es-client-1098028550-0lbj4 9200:9200

Before doing anything, we need to know our Elasticsearch cluster version first.

$ curl 127.0.0.1:9200
{
 “name” : “es-client-1098028550-s02np”,
 “cluster_name” : “myesdb”,
 “cluster_uuid” : “ngKmW6mXQ06g85qW_i0scg”,
 “version” : {
 “number” : “5.2.2”,
 “build_hash” : “f9d9b74”,
 “build_date” : “2017–02–24T17:26:45.835Z”,
 “build_snapshot” : false,
 “lucene_version” : “6.4.1”
 },
 “tagline” : “You Know, for Search”
}

Ok, so we are running version 5.2.2. Lets get the status for all indices.

$ curl 127.0.0.1:9200/_cat/indices?v                                                                                                                                                                              health status index     uuid                   pri rep docs.count docs.deleted store.size pri.store.size
red    open   honeytrap Yj_2sW2YToOKvDKEPlb9Rg   5   1     564111            0    355.2mb        177.6mb

The honeytrap index has been turned red. Most of the solutions you read on the internet are to remove the index. But that’s too easy. Lets try and if and how to recover.

First we need to find out what shard or shards are failing.

$ curl 127.0.0.1:9200/_cat/shards?v                                                                                                                                                                               index     shard prirep state        docs    store ip         node
honeytrap 4     r      STARTED    140941   44.5mb 10.0.7.108 es-data-1
honeytrap 4     p      STARTED    140941   44.5mb 10.0.3.27  es-data-0
honeytrap 3     p      UNASSIGNED
honeytrap 3     r      UNASSIGNED
honeytrap 2     r      STARTED    140704   43.9mb 10.0.7.108 es-data-1
honeytrap 2     p      STARTED    140704   43.9mb 10.0.3.27  es-data-0
honeytrap 1     r      STARTED    141144   44.5mb 10.0.7.108 es-data-1
honeytrap 1     p      STARTED    141144   44.5mb 10.0.3.27  es-data-0
honeytrap 0     r      STARTED    141322   44.5mb 10.0.7.108 es-data-1
honeytrap 0     p      STARTED    141322   44.6mb 10.0.3.27  es-data-0

Clearly we have an issue with the 3th shard on the honeytrap index. Now we need to find out the reason of the shard being unassigned.

$ curl http://127.0.0.1:9200/_cluster/allocation/explain?pretty                                                                                                                                                   {
  "index" : "honeytrap",
  "shard" : 3,
  "primary" : true,
  "current_state" : "unassigned",
  "unassigned_info" : {
    "reason" : "DANGLING_INDEX_IMPORTED",
    "at" : "2017-06-27T05:58:42.344Z",
    "last_allocation_status" : "no_valid_shard_copy"
  },
  "can_allocate" : "no_valid_shard_copy",
  "allocate_explanation" : "cannot allocate because all found copies of the shard are either stale or corrupt",
  "node_allocation_decisions" : [
    {
      "node_id" : "-mh0nYN0SWCdFMsyPzyxrw",
      "node_name" : "es-data-1",
      "transport_address" : "10.0.7.108:9300",
      "node_decision" : "no",
      "store" : {
        "found" : false
      }
    },
    {
      "node_id" : "2LDpJzwBSiOEScwMGkr9iA",
      "node_name" : "es-data-0",
      "transport_address" : "10.0.3.27:9300",
      "node_decision" : "no",
      "store" : {
        "in_sync" : false,
        "allocation_id" : "JFWiDodaSiu0O9PCKQ39xQ"
      }
    },
    {
      "node_id" : "jLOmaTJHS0ubFx-CGvf9lg",
      "node_name" : "es-data-2",
      "transport_address" : "10.0.7.109:9300",
      "node_decision" : "no",
      "store" : {
        "in_sync" : false,
        "allocation_id" : "oyEuh0sPSm-Zutg-MzOjUw"
      }
    }
  ]
}

This output says that both copies are either stale or corrupt. We know that in the end all indexes in Elasticsearch are lucene, so let’s find out if the shards are really corrupt.

$ kubectl --namespace escluster exec -it es-data-2 -- /bin/bash
bash-4.3# cd /elasticsearch/lib/
bash-4.3# java -cp lucene-core*.jar -ea:org.apache.lucene... org.apache.lucene.index.CheckIndex /data/data/nodes/0/indices/Yj_2sW2YToOKvDKEPlb9Rg/3/index/Opening index @ /data/data/nodes/0/indices/Yj_2sW2YToOKvDKEPlb9Rg/3/index/Segments file=segments_wj numSegments=7 version=6.4.1 id=69ious8xtz17ft6ytqh82vh3w userData={sync_id=AVzmF0VrGu60_HMPPWkT, translog_generation=19, translog_uuid=ct2Rw1oARt-RXVudNC1-Hw}....No problems were detected with this index.Took 16.629 sec total.bash-4.3# exit

So checking manually using lucene CheckIndex gives no problems with the index. This is good news, now lets see if we can assign the shard of the index to a specific node.

$ curl -XPOST '127.0.0.1:9200/_cluster/reroute?pretty' -d '{
    "commands" : [ {
        "allocate_stale_primary" :
            {
              "index" : "honeytrap", "shard" : 3,
              "node" : "es-data-2",
              "accept_data_loss" : true
            }
        }
    ]
}'

Important is that we specifically are accepting data loss. The data is not that important and for us it is more important to have the index started quickly.

Lets see if we still have dangling shards.

$ curl http://127.0.0.1:9200/_shard_stores?pretty                                                                                                                                                                 {
  "indices" : { }
}

Great, all shards have been assigned. So now see the index status:

$ curl http://127.0.0.1:9200/_cat/indices
green open honeytrap Yj_2sW2YToOKvDKEPlb9Rg 5 1 708412 0 451.1mb 225.6mb

Awesome, everything up and running again.

If the allocate_stale_primary would have failed, there is also the option to use allocate_empty_primary. This will replace the shard with an empty primary, effectively removing all data from the shard.

More information about reroute can be found here https://www.elastic.co/guide/en/elasticsearch/reference/current/cluster-reroute.html.

When everything else fails

Written by Remco Verhoef