A detailed guide to deploying Elasticsearch on Elastic Cloud on Kubernetes (ECK)

Published in

99.co

12 min readAug 23, 2021

Background

99.co Singapore portal’s listings search feature is powered by Elasticsearch (ES), a distributed search engine that can perform complicated queries and aggregations at a fast speed. As a background, our backend architecture is based on the microservices architecture running in Google Kubernetes Engine (GKE), including the search service.

Search service is the core service that supports our listings search feature. Now, while our search service is running on GKE, the ES itself was not the case. We were still running the ES on docker containers in GCP VM instances. The entire search service system can be visualized as below.

We run 2 types of processes in our search service:

Search Service Web: A web server serving requests from other services by returning search objects following certain parameters.
Search Service Indexer: A process that indexes denormalized objects in ES by continuously fetching data from its origin services, i.e. listing service, user service, etc.

While it is perfectly fine to run ES on a self-hosted machine, problems might come when we need to do maintenance on the machine itself, i.e. scaling up machine resources such as CPU/memory, upgrading version, and adding new nodes, etc. We would have to do a rolling upgrade by spinning down each ES node & VM instance, performing the necessary maintenance, and starting up each machine and container one by one. While it sounds simple, things could go south easily, especially when our tired minds perform this procedure in the wee midnight hours.

In fact, early this July, we had a 6 hours outage on one of our 3 shards when we performed an upgrade to the ES version. There was a network connection misconfiguration in istio-proxy that stopped the rebalancing process and ultimately degraded our service performance below 99.9% SLA . Although we managed to restore it before our main business hour, the midnight stress & tiredness are not something we ever want to go through again. Hence, mankind search for automation began :D

Motivation

Luckily, Elastic Cloud on Kubernetes (ECK) comes to the rescue. Since its release to the general public on 16th January 2020, we now have an option to run our Elasticsearch fully in Kubernetes. ECK is utilizing Kubernetes operator pattern and Custom Resource Definition (CRD) in order to provide high level abstraction to deploy, package and manage the application. Once ECK is set up, automated deployment helps reduce the risk of human error during upgrade or maintenance cycle. On top of that, ECK also manages critical tasks on behalf of us such as:

Managing and monitoring multiple clusters
Upgrading to new stack versions with ease
Scaling cluster capacity up and down
Changing cluster configuration
Dynamically scaling local storage (includes Elastic Local Volume, a local storage driver)
Scheduling backups

How we set up ECK in 99

Before we start, here’s a diagram of the entire ECK system that we are going to achieve.

Search service architecture with ECK — Search Service Architecture with ECK

Don’t be daunted by the seemingly complex components in the Elasticsearch System box. ECK’s CRD and operator has made all the setup simple and we only need to add configurations to the necessary YAML files. Follow along for the step by step explanation.

To start off, we need to install the CRD and the elastic operator to our Kubernetes cluster.

kubectl apply -f https://download.elastic.co/downloads/eck/1.6.0/all-in-one.yaml

The all-in-one.yaml file contains a bunch of yaml files bundled together. First off, it will create a namespace called elastic-system where the elastic-operator will be running along with other necessary objects.

# Source: eck-operator/templates/operator-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: elastic-system
  labels:
    name: elastic-system
---
# Source: eck-operator/templates/service-account.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
  name: elastic-operator
  namespace: elastic-system
  labels:
    control-plane: elastic-operator
    app.kubernetes.io/version: "1.6.0"
---
# Source: eck-operator/templates/webhook.yaml
apiVersion: v1
kind: Secret
metadata:
  name: elastic-webhook-server-cert
  namespace: elastic-system
  labels:
    control-plane: elastic-operator
    app.kubernetes.io/version: "1.6.0"
---
# Source: eck-operator/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: elastic-operator
  namespace: elastic-system
  labels:
    control-plane: elastic-operator
    app.kubernetes.io/version: "1.6.0"
data:
  eck.yaml: |-
    log-verbosity: 0
    metrics-port: 0
    container-registry: docker.elastic.co
    max-concurrent-reconciles: 3
    ca-cert-validity: 8760h
    ca-cert-rotate-before: 24h
    cert-validity: 8760h
    cert-rotate-before: 24h
    set-default-security-context: true
    kube-client-timeout: 60s
    elasticsearch-client-timeout: 180s
    disable-telemetry: false
    validate-storage-class: true
    enable-webhook: true
    webhook-name: elastic-webhook.k8s.elastic.co

Then the most important part of running the elastic-operator is defined towards the end of the file which uses StatefulSet.

# Source: eck-operator/templates/statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: elastic-operator
  namespace: elastic-system
  labels:
    control-plane: elastic-operator
    app.kubernetes.io/version: "1.6.0"
spec:
  selector:
    matchLabels:
      control-plane: elastic-operator
  serviceName: elastic-operator
  replicas: 1
  template:
    metadata:
      annotations:
        # Rename the fields "error" to "error.message" and "source" to "event.source"
        # This is to avoid a conflict with the ECS "error" and "source" documents.
        "co.elastic.logs/raw": "[{\"type\":\"container\",\"json.keys_under_root\":true,\"paths\":[\"/var/log/containers/*${data.kubernetes.container.id}.log\"],\"processors\":[{\"convert\":{\"mode\":\"rename\",\"ignore_missing\":true,\"fields\":[{\"from\":\"error\",\"to\":\"_error\"}]}},{\"convert\":{\"mode\":\"rename\",\"ignore_missing\":true,\"fields\":[{\"from\":\"_error\",\"to\":\"error.message\"}]}},{\"convert\":{\"mode\":\"rename\",\"ignore_missing\":true,\"fields\":[{\"from\":\"source\",\"to\":\"_source\"}]}},{\"convert\":{\"mode\":\"rename\",\"ignore_missing\":true,\"fields\":[{\"from\":\"_source\",\"to\":\"event.source\"}]}}]}]"
        "checksum/config": 3c2010a9355a35f49003014b553c3315c92569d20875c18788dd85b73a97c6c7
      labels:
        control-plane: elastic-operator
    spec:
      terminationGracePeriodSeconds: 10
      serviceAccountName: elastic-operator
      securityContext:
        runAsNonRoot: true
      containers:
      - image: "docker.elastic.co/eck/eck-operator:1.6.0"
        imagePullPolicy: IfNotPresent
        name: manager
        args:
        - "manager"
        - "--config=/conf/eck.yaml"
        - "--distribution-channel=all-in-one"
        env:
        - name: OPERATOR_NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: POD_IP
          valueFrom:
            fieldRef:
              fieldPath: status.podIP
        - name: WEBHOOK_SECRET
          value: elastic-webhook-server-cert
        resources:
          limits:
            cpu: 1
            memory: 512Mi
          requests:
            cpu: 100m
            memory: 150Mi
        ports:
        - containerPort: 9443
          name: https-webhook
          protocol: TCP
        volumeMounts:
        - mountPath: "/conf"
          name: conf
          readOnly: true
        - mountPath: /tmp/k8s-webhook-server/serving-certs
          name: cert
          readOnly: true
      volumes:
      - name: conf
        configMap:
          name: elastic-operator
      - name: cert
        secret:
          defaultMode: 420
          secretName: elastic-webhook-server-cert

Next, we would be able to see various CustomResourceDefinitions (CRDs), like the Elasticsearch CRD below (not everything is pasted here due to the massive size of the file). This CRD defines how we can provide the specifications to the Elasticsearch application later on.

# Source: eck-operator/charts/eck-operator-crds/templates/all-crds.yaml
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  annotations:
    controller-gen.kubebuilder.io/version: v0.5.0
  creationTimestamp: null
  labels:
    app.kubernetes.io/instance: 'elastic-operator'
    app.kubernetes.io/name: 'eck-operator-crds'
    app.kubernetes.io/version: '1.6.0'
  name: elasticsearches.elasticsearch.k8s.elastic.co
spec:
  additionalPrinterColumns:
  - JSONPath: .status.health
    name: health
    type: string
  - JSONPath: .status.availableNodes
    description: Available nodes
    name: nodes
    type: integer
  - JSONPath: .status.version
    description: Elasticsearch version
    name: version
    type: string
  - JSONPath: .status.phase
    name: phase
    type: string
  - JSONPath: .metadata.creationTimestamp
    name: age
    type: date
  group: elasticsearch.k8s.elastic.co
  names:
    categories:
    - elastic
    kind: Elasticsearch
    listKind: ElasticsearchList
    plural: elasticsearches
    shortNames:
    - es
    singular: elasticsearch
  scope: Namespaced
  subresources:
    status: {}
  validation:
    openAPIV3Schema:
      description: Elasticsearch represents an Elasticsearch resource in a Kubernetes
        cluster.
      properties:
        apiVersion:
          description: 'APIVersion defines the versioned schema of this representation
            of an object. Servers should convert recognized schemas to the latest
            internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
          type: string
        kind:
          description: 'Kind is a string value representing the REST resource this
            object represents. Servers may infer this from the endpoint the client
            submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
          type: string
        metadata:
          type: object
        spec:
          description: ElasticsearchSpec holds the specification of an Elasticsearch
            cluster.
          properties:
            auth:
              description: Auth contains user authentication and authorization security
                settings for Elasticsearch.
              properties:
                fileRealm:
                  description: FileRealm to propagate to the Elasticsearch cluster.
                  items:
                    description: FileRealmSource references users to create in the
                      Elasticsearch cluster.
                    properties:
                      secretName:
                        description: SecretName is the name of the secret.
                        type: string
                    type: object
                  type: array
.
.
.

Now, we would be able to see the elastic-operator-0 pod running in elastic-system namespace. We can tail the logs with the command below

kubectl -n elastic-system logs -f statefulset.apps/elastic-operator

The next exciting step is to deploy the Elasticsearch cluster itself!
However, prior to that, we created a separate NodePool in our k8s cluster to host the ES nodes. This helps us to isolate the ES nodes from our main NodePool where most of our micro services are running. Imagine if all of a sudden, our micro services are running high on CPU/memory, the resources shared to ES might be eaten up and thus affecting the performance of the entire cluster. Another benefit is that we can fine tune NodePool resource accurately to the ES nodes especially since they tend to be heavier on the memory usage. We can visualize this with the image below.

Applying Taint/Tolerations and NodeAffinity to isolate scheduling pods to a NodePool

In order to isolate the elastic-cloud NodePool, we use Kubernetes’ scheduling feature called Taint with NoSchedule effect. This taint must be added during NodePool creation. Our Elasticsearch pods must also be configured to tolerate this taint with the tolerations field. The whole policy effectively prevents other pods without tolerance to NoSchedule effect for elastic-only=true from scheduling to this NodePool. As a result, only the ES pods can be scheduled in the elastic-cloud NodePool.
Finally, we can start deploying the ES cluster with its nodes. We create a file called elasticsearch.yaml as follows:

apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
  name: search-prod
  namespace: sg-prod
spec:
  version: 7.11.1
  nodeSets:
  - name: default
    count: 3
    config:
      node.master: true
      node.data: true
      node.ingest: true
    podTemplate:
      metadata:
        namespace: sg-prod
        labels:
          # additional labels for pods
          env: prod
      spec:
        initContainers:
        - name: sysctl
          securityContext:
            privileged: true
          command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
        affinity:
          nodeAffinity:
            requiredDuringSchedulingIgnoredDuringExecution:
              nodeSelectorTerms:
              - matchExpressions:
                - key: cloud.google.com/gke-nodepool
                  operator: In
                  values:
                  - elastic-cloud
        tolerations:
        - key: elastic-only
          operator: Equal
          value: "true"
          effect: NoSchedule
        containers:
        - name: elasticsearch
          resources:
            requests:
              memory: 20G
              cpu: 8
            limits:
              memory: 30G
          env:
    # request xx of persistent data storage for pods in this topology element
    volumeClaimTemplates:
    - metadata:
        name: elasticsearch-data
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 80Gi
        storageClassName: faster-southeast1-a
    config:
      xpack.security.authc:
        anonymous:
          username: anonymous
          roles: superuser
          authz_exception: false
references
  http:
    tls:
      selfSignedCertificate:
        disabled: true

In nodeSets, we can specify sets of nodes that we want to deploy. In this case, we are using 1 nodeSets that have 3 nodes, running as master, data and ingest nodes. Internally, the operator will create StatefulSets as each pod (ES Node) needs to be stateful with stable persistent storage as well. There are also other configuration setups for the nodeSets: e.g 3 data nodes and 2 lightweight master nodes.
Furthermore, we add initContainers to increase the kernel settings vm.max_map_count to 262144 with privileged mode. This increases the virtual address space that can be mapped to files. Without it, we may encounter out of memory exceptions. It is recommended by the documentation for production usage.
nodeAffinity of requiredDuringSchedulingIgnoredDuringExecution requires the pods to be scheduled only in elastic-cloud NodePool. This combined with the tolerations discussed earlier, will ensure the nodes run only in elastic-cloud NodePool. Without the nodeAffinity, the ES nodes can be scheduled anywhere in the k8s cluster, while without the tolerations, the ES nodes can be scheduled in any NodePool except elastic-cloud. Thus we need to have both settings together.
Next, we can define the CPU/memory requests and limits, just like how we normally define resources in Deployments. As for why we need to set the requests & limits, here is an excellent post on the topic.
For storage, we ask the operator to request each pod a PersistentVolumeClaim of 80Gi with a certain Storage Class. This allows PersistentVolume to be dynamically provisioned without the need to manually create PV every time we need it. In this case, storageClass of faster-southeast1-a is a SSD persistent disk type provided by GCE (Google Compute Engine). Another benefit that comes with using Storage Class is we can expand the volume automatically in case we are running out of space. In such a scenario, ECK will update the existing PVCs accordingly, and recreate the StatefulSets automatically.
In http, we expose the Service as a LoadBalancer. By default, if we don’t set this, we will get a ClusterIP service.
TLS in the HTTP layer is enabled also by default. Behind the scenes, the elastic-operator will create self-signed certificates to be used. We disable this for simplicity as all our intra services communication is private and walled behind GCP firewall policy.
Now, we run kubectl apply -f elasticsearch.yaml, and voila!
We can see the cluster object (note that es is a custom resource definition coming from the all-in-one.yaml that we applied at the first step)

$ kubectl get es 
NAME HEALTH NODES VERSION PHASE AGE
search-prod green 3 7.11.1 Ready 48d

We can also see the ES nodes running as pods

$ kubectl get pods | grep search-prod-es 
search-prod-es-default-0 1/1 Running 0 5h25m
search-prod-es-default-1 1/1 Running 0 5h26m
search-prod-es-default-2 1/1 Running 0 5h28m

We can see the StatefulSet and Service created as well. Since we use LoadBalancer as the service, we can call our pods by the external IP directly, in this case 10.148.0.117:9200. If we are only using ClusterIP, we can use kubectl port-forward service/search-prod-es-http 9200 and hit localhost:9200.

$ kubectl get statefulset | grep search-prod-es 
search-prod-es-default 3/3 48d$ kubectl get service | grep search-prod-es 
search-prod-es-default ClusterIP None <none> 9200/TCP 48d
search-prod-es-http LoadBalancer 10.31.243.139 10.148.0.117 9200/TCP 48d
search-prod-es-transport ClusterIP None <none> 9300/TCP 48d

The elastic-operator also creates a bunch of Secrets for each cluster that we deploy.

$ kubectl get secret | grep search-prod-es 
search-prod-es-default-es-config Opaque 1 48d
search-prod-es-default-es-transport-certs Opaque 7 48d
search-prod-es-elastic-user Opaque 1 48d
search-prod-es-http-ca-internal Opaque 2 48d
search-prod-es-http-certs-internal Opaque 3 48d
search-prod-es-http-certs-public Opaque 2 48d
search-prod-es-internal-users Opaque 2 48d
search-prod-es-remote-ca Opaque 1 48d
search-prod-es-transport-ca-internal Opaque 2 48d
search-prod-es-transport-certs-public Opaque 1 48d
search-prod-es-xpack-file-realm Opaque 3 48d

Next step is to deploy Kibana, which is simpler than the Elasticsearch itself. We create a kibana.yaml.

apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
  name: search-prod
  namespace: sg-prod
spec:
  version: 7.11.1
  count: 1
  elasticsearchRef:
    name: search-prod
  podTemplate:
    metadata:
      namespace: sg-prod
      labels:
        env: prod
    spec:
      tolerations:
      - key: elastic-only
        operator: Equal
        value: "true"
        effect: NoSchedule
      containers:
        - name: kibana
          resources:
            requests:
              memory: 1Gi
              cpu: 500m
            limits:
              memory: 15Gi
  http:
    tls:
      selfSignedCertificate:
        disabled: true

The most important part here is the elasticsearchRef to be filled with the name in the metadata.name in the elasticsearch.yaml. Under the hood, kibana will actually create a Deployment instead of StatefulSet, since Kibana doesn’t need any persistent storage.
Deploy with kubectl apply -f kibana.yaml, and we can see the kibana running.

$ kubectl get po | grep search-prod-kb 
search-prod-kb-df8555ff7-vhp5h 2/2 Running 0 6d13h$ kubectl get deployment | grep search-prod-kbsearch-prod-kb 1/1 1 1 45d$ kubectl get service | grep search-prod-kbsearch-prod-kb-http ClusterIP 10.31.243.93 <none> 5601/TCP 45d

Rolling Upgrade with ECK

At this point, our Elasticsearch cluster is already up and running. Let’s step back to addressing our initial problem, how does ECK manage to perform rolling upgrades to the nodes automatically?
The answer is using Update Strategy. Since we did not provide the updateStrategy in our elasticsearch.yaml, it will be defaulted to:

spec:
 updateStrategy:
   changeBudget:
     maxSurge: -1
     maxUnavailable: 1

maxUnavailable defaults to 1, which means this ensures the cluster has no more than 1 unavailable Pod at any point of time.
maxSurge defaults to -1, which is unbounded, all Pods can be recreated immediately.

With these settings, when we change the Elasticsearch YAML file that requires the Pod to restart, we are guaranteed to have only 1 Pod restarting at a time. Our cluster’s health will be marked as Yellow since we will have some missing replica shards (from the terminated node), but it is still enough to serve requests.

Here’s a deeper look into the elastic-operator log after we update the elasticsearch.yaml file

{“log.level”:”info”,”@timestamp”:”2021–07–15T02:59:38.071Z”,”log.logger”:”driver”,”message”:”Disabling shards allocation”,”service.version”:”1.4.0+4aff0b98",”service.type”:”eck”,”ecs.version”:”1.4.0",”es_name”:”search-prod”,”namespace”:”sg-prod”}
{“log.level”:”info”,”@timestamp”:”2021–07–15T02:59:38.115Z”,”log.logger”:”driver”,”message”:”Requesting a synced flush”,”service.version”:”1.4.0+4aff0b98",”service.type”:”eck”,”ecs.version”:”1.4.0",”es_name”:”search-prod”,”namespace”:”sg-prod”}
{“log.level”:”info”,”@timestamp”:”2021–07–15T02:59:38.491Z”,”log.logger”:”driver”,”message”:”synced flush failed with 409 CONFLICT. Ignoring.”,”service.version”:”1.4.0+4aff0b98",”service.type”:”eck”,”ecs.version”:”1.4.0",”namespace”:”sg-prod”,”es_name”:”search-prod”}
{“log.level”:”info”,”@timestamp”:”2021–07–15T02:59:38.491Z”,”log.logger”:”driver”,”message”:”Deleting pod for rolling upgrade”,”service.version”:”1.4.0+4aff0b98",”service.type”:”eck”,”ecs.version”:”1.4.0",”es_name”:”search-prod”,”namespace”:”sg-prod”,”pod_name”:”search-prod-es-default-0",”pod_uid”:”ed2bce02-dd3d-4c
d9–97ed-af190d69935e”}

We can see that the operator is disabling shards allocation and requesting for a synced flush, exactly what is recommended by the document when we need to perform manual rolling upgrade. The same operations are then repeated for the rest of the pods (not shown here). No more human interventions needed on rolling upgrade 🎉

Summary

Running Elasticsearch on Kubernetes allows developers/admins to utilize container orchestration by Kubernetes and apply best practices on managing Elasticsearch clusters by the Elastic Operator. While it adds another level of complexity by running everything on k8s, it can greatly simplify manual operations needed by humans and offers a peace of mind to the developers.

References/Inspirations: