A detailed guide to deploying Elasticsearch on Elastic Cloud on Kubernetes (ECK)

Gregorius Marco
99.co
Published in
12 min readAug 23, 2021
Elastic Cloud on Kubernetes

Background

99.co Singapore portal’s listings search feature is powered by Elasticsearch (ES), a distributed search engine that can perform complicated queries and aggregations at a fast speed. As a background, our backend architecture is based on the microservices architecture running in Google Kubernetes Engine (GKE), including the search service.

Search service is the core service that supports our listings search feature. Now, while our search service is running on GKE, the ES itself was not the case. We were still running the ES on docker containers in GCP VM instances. The entire search service system can be visualized as below.

Previous search service architecture

We run 2 types of processes in our search service:

  1. Search Service Web: A web server serving requests from other services by returning search objects following certain parameters.
  2. Search Service Indexer: A process that indexes denormalized objects in ES by continuously fetching data from its origin services, i.e. listing service, user service, etc.

While it is perfectly fine to run ES on a self-hosted machine, problems might come when we need to do maintenance on the machine itself, i.e. scaling up machine resources such as CPU/memory, upgrading version, and adding new nodes, etc. We would have to do a rolling upgrade by spinning down each ES node & VM instance, performing the necessary maintenance, and starting up each machine and container one by one. While it sounds simple, things could go south easily, especially when our tired minds perform this procedure in the wee midnight hours.

In fact, early this July, we had a 6 hours outage on one of our 3 shards when we performed an upgrade to the ES version. There was a network connection misconfiguration in istio-proxy that stopped the rebalancing process and ultimately degraded our service performance below 99.9% SLA . Although we managed to restore it before our main business hour, the midnight stress & tiredness are not something we ever want to go through again. Hence, mankind search for automation began :D

Motivation

Luckily, Elastic Cloud on Kubernetes (ECK) comes to the rescue. Since its release to the general public on 16th January 2020, we now have an option to run our Elasticsearch fully in Kubernetes. ECK is utilizing Kubernetes operator pattern and Custom Resource Definition (CRD) in order to provide high level abstraction to deploy, package and manage the application. Once ECK is set up, automated deployment helps reduce the risk of human error during upgrade or maintenance cycle. On top of that, ECK also manages critical tasks on behalf of us such as:

  • Managing and monitoring multiple clusters
  • Upgrading to new stack versions with ease
  • Scaling cluster capacity up and down
  • Changing cluster configuration
  • Dynamically scaling local storage (includes Elastic Local Volume, a local storage driver)
  • Scheduling backups

How we set up ECK in 99

Before we start, here’s a diagram of the entire ECK system that we are going to achieve.

Search service architecture with ECK
Search Service Architecture with ECK

Don’t be daunted by the seemingly complex components in the Elasticsearch System box. ECK’s CRD and operator has made all the setup simple and we only need to add configurations to the necessary YAML files. Follow along for the step by step explanation.

  • To start off, we need to install the CRD and the elastic operator to our Kubernetes cluster.
kubectl apply -f https://download.elastic.co/downloads/eck/1.6.0/all-in-one.yaml
  • The all-in-one.yaml file contains a bunch of yaml files bundled together. First off, it will create a namespace called elastic-system where the elastic-operator will be running along with other necessary objects.
# Source: eck-operator/templates/operator-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: elastic-system
labels:
name: elastic-system
---
# Source: eck-operator/templates/service-account.yaml
apiVersion: v1
kind: ServiceAccount
metadata:
name: elastic-operator
namespace: elastic-system
labels:
control-plane: elastic-operator
app.kubernetes.io/version: "1.6.0"
---
# Source: eck-operator/templates/webhook.yaml
apiVersion: v1
kind: Secret
metadata:
name: elastic-webhook-server-cert
namespace: elastic-system
labels:
control-plane: elastic-operator
app.kubernetes.io/version: "1.6.0"
---
# Source: eck-operator/templates/configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: elastic-operator
namespace: elastic-system
labels:
control-plane: elastic-operator
app.kubernetes.io/version: "1.6.0"
data:
eck.yaml: |-
log-verbosity: 0
metrics-port: 0
container-registry: docker.elastic.co
max-concurrent-reconciles: 3
ca-cert-validity: 8760h
ca-cert-rotate-before: 24h
cert-validity: 8760h
cert-rotate-before: 24h
set-default-security-context: true
kube-client-timeout: 60s
elasticsearch-client-timeout: 180s
disable-telemetry: false
validate-storage-class: true
enable-webhook: true
webhook-name: elastic-webhook.k8s.elastic.co
  • Then the most important part of running the elastic-operator is defined towards the end of the file which uses StatefulSet.
# Source: eck-operator/templates/statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: elastic-operator
namespace: elastic-system
labels:
control-plane: elastic-operator
app.kubernetes.io/version: "1.6.0"
spec:
selector:
matchLabels:
control-plane: elastic-operator
serviceName: elastic-operator
replicas: 1
template:
metadata:
annotations:
# Rename the fields "error" to "error.message" and "source" to "event.source"
# This is to avoid a conflict with the ECS "error" and "source" documents.
"co.elastic.logs/raw": "[{\"type\":\"container\",\"json.keys_under_root\":true,\"paths\":[\"/var/log/containers/*${data.kubernetes.container.id}.log\"],\"processors\":[{\"convert\":{\"mode\":\"rename\",\"ignore_missing\":true,\"fields\":[{\"from\":\"error\",\"to\":\"_error\"}]}},{\"convert\":{\"mode\":\"rename\",\"ignore_missing\":true,\"fields\":[{\"from\":\"_error\",\"to\":\"error.message\"}]}},{\"convert\":{\"mode\":\"rename\",\"ignore_missing\":true,\"fields\":[{\"from\":\"source\",\"to\":\"_source\"}]}},{\"convert\":{\"mode\":\"rename\",\"ignore_missing\":true,\"fields\":[{\"from\":\"_source\",\"to\":\"event.source\"}]}}]}]"
"checksum/config": 3c2010a9355a35f49003014b553c3315c92569d20875c18788dd85b73a97c6c7
labels:
control-plane: elastic-operator
spec:
terminationGracePeriodSeconds: 10
serviceAccountName: elastic-operator
securityContext:
runAsNonRoot: true
containers:
- image: "docker.elastic.co/eck/eck-operator:1.6.0"
imagePullPolicy: IfNotPresent
name: manager
args:
- "manager"
- "--config=/conf/eck.yaml"
- "--distribution-channel=all-in-one"
env:
- name: OPERATOR_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: POD_IP
valueFrom:
fieldRef:
fieldPath: status.podIP
- name: WEBHOOK_SECRET
value: elastic-webhook-server-cert
resources:
limits:
cpu: 1
memory: 512Mi
requests:
cpu: 100m
memory: 150Mi
ports:
- containerPort: 9443
name: https-webhook
protocol: TCP
volumeMounts:
- mountPath: "/conf"
name: conf
readOnly: true
- mountPath: /tmp/k8s-webhook-server/serving-certs
name: cert
readOnly: true
volumes:
- name: conf
configMap:
name: elastic-operator
- name: cert
secret:
defaultMode: 420
secretName: elastic-webhook-server-cert
  • Next, we would be able to see various CustomResourceDefinitions (CRDs), like the Elasticsearch CRD below (not everything is pasted here due to the massive size of the file). This CRD defines how we can provide the specifications to the Elasticsearch application later on.
# Source: eck-operator/charts/eck-operator-crds/templates/all-crds.yaml
apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
annotations:
controller-gen.kubebuilder.io/version: v0.5.0
creationTimestamp: null
labels:
app.kubernetes.io/instance: 'elastic-operator'
app.kubernetes.io/name: 'eck-operator-crds'
app.kubernetes.io/version: '1.6.0'
name: elasticsearches.elasticsearch.k8s.elastic.co
spec:
additionalPrinterColumns:
- JSONPath: .status.health
name: health
type: string
- JSONPath: .status.availableNodes
description: Available nodes
name: nodes
type: integer
- JSONPath: .status.version
description: Elasticsearch version
name: version
type: string
- JSONPath: .status.phase
name: phase
type: string
- JSONPath: .metadata.creationTimestamp
name: age
type: date
group: elasticsearch.k8s.elastic.co
names:
categories:
- elastic
kind: Elasticsearch
listKind: ElasticsearchList
plural: elasticsearches
shortNames:
- es
singular: elasticsearch
scope: Namespaced
subresources:
status: {}
validation:
openAPIV3Schema:
description: Elasticsearch represents an Elasticsearch resource in a Kubernetes
cluster.
properties:
apiVersion:
description: 'APIVersion defines the versioned schema of this representation
of an object. Servers should convert recognized schemas to the latest
internal value, and may reject unrecognized values. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#resources'
type: string
kind:
description: 'Kind is a string value representing the REST resource this
object represents. Servers may infer this from the endpoint the client
submits requests to. Cannot be updated. In CamelCase. More info: https://git.k8s.io/community/contributors/devel/sig-architecture/api-conventions.md#types-kinds'
type: string
metadata:
type: object
spec:
description: ElasticsearchSpec holds the specification of an Elasticsearch
cluster.
properties:
auth:
description: Auth contains user authentication and authorization security
settings for Elasticsearch.
properties:
fileRealm:
description: FileRealm to propagate to the Elasticsearch cluster.
items:
description: FileRealmSource references users to create in the
Elasticsearch cluster.
properties:
secretName:
description: SecretName is the name of the secret.
type: string
type: object
type: array
.
.
.
  • Now, we would be able to see the elastic-operator-0 pod running in elastic-system namespace. We can tail the logs with the command below
kubectl -n elastic-system logs -f statefulset.apps/elastic-operator
  • The next exciting step is to deploy the Elasticsearch cluster itself!
  • However, prior to that, we created a separate NodePool in our k8s cluster to host the ES nodes. This helps us to isolate the ES nodes from our main NodePool where most of our micro services are running. Imagine if all of a sudden, our micro services are running high on CPU/memory, the resources shared to ES might be eaten up and thus affecting the performance of the entire cluster. Another benefit is that we can fine tune NodePool resource accurately to the ES nodes especially since they tend to be heavier on the memory usage. We can visualize this with the image below.
Applying Taint/Tolerations and NodeAffinity to isolate scheduling pods to a NodePool
  • In order to isolate the elastic-cloud NodePool, we use Kubernetes’ scheduling feature called Taint with NoSchedule effect. This taint must be added during NodePool creation. Our Elasticsearch pods must also be configured to tolerate this taint with the tolerations field. The whole policy effectively prevents other pods without tolerance to NoSchedule effect for elastic-only=true from scheduling to this NodePool. As a result, only the ES pods can be scheduled in the elastic-cloud NodePool.
  • Finally, we can start deploying the ES cluster with its nodes. We create a file called elasticsearch.yaml as follows:
apiVersion: elasticsearch.k8s.elastic.co/v1
kind: Elasticsearch
metadata:
name: search-prod
namespace: sg-prod
spec:
version: 7.11.1
nodeSets:
- name: default
count: 3
config:
node.master: true
node.data: true
node.ingest: true
podTemplate:
metadata:
namespace: sg-prod
labels:
# additional labels for pods
env: prod
spec:
initContainers:
- name: sysctl
securityContext:
privileged: true
command: ['sh', '-c', 'sysctl -w vm.max_map_count=262144']
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: cloud.google.com/gke-nodepool
operator: In
values:
- elastic-cloud
tolerations:
- key: elastic-only
operator: Equal
value: "true"
effect: NoSchedule
containers:
- name: elasticsearch
resources:
requests:
memory: 20G
cpu: 8
limits:
memory: 30G
env:
# request xx of persistent data storage for pods in this topology element
volumeClaimTemplates:
- metadata:
name: elasticsearch-data
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 80Gi
storageClassName: faster-southeast1-a
config:
xpack.security.authc:
anonymous:
username: anonymous
roles: superuser
authz_exception: false
references
http:
tls:
selfSignedCertificate:
disabled: true
  • In nodeSets, we can specify sets of nodes that we want to deploy. In this case, we are using 1 nodeSets that have 3 nodes, running as master, data and ingest nodes. Internally, the operator will create StatefulSets as each pod (ES Node) needs to be stateful with stable persistent storage as well. There are also other configuration setups for the nodeSets: e.g 3 data nodes and 2 lightweight master nodes.
  • Furthermore, we add initContainers to increase the kernel settings vm.max_map_count to 262144 with privileged mode. This increases the virtual address space that can be mapped to files. Without it, we may encounter out of memory exceptions. It is recommended by the documentation for production usage.
  • nodeAffinity of requiredDuringSchedulingIgnoredDuringExecution requires the pods to be scheduled only in elastic-cloud NodePool. This combined with the tolerations discussed earlier, will ensure the nodes run only in elastic-cloud NodePool. Without the nodeAffinity, the ES nodes can be scheduled anywhere in the k8s cluster, while without the tolerations, the ES nodes can be scheduled in any NodePool except elastic-cloud. Thus we need to have both settings together.
  • Next, we can define the CPU/memory requests and limits, just like how we normally define resources in Deployments. As for why we need to set the requests & limits, here is an excellent post on the topic.
  • For storage, we ask the operator to request each pod a PersistentVolumeClaim of 80Gi with a certain Storage Class. This allows PersistentVolume to be dynamically provisioned without the need to manually create PV every time we need it. In this case, storageClass of faster-southeast1-a is a SSD persistent disk type provided by GCE (Google Compute Engine). Another benefit that comes with using Storage Class is we can expand the volume automatically in case we are running out of space. In such a scenario, ECK will update the existing PVCs accordingly, and recreate the StatefulSets automatically.
  • In http, we expose the Service as a LoadBalancer. By default, if we don’t set this, we will get a ClusterIP service.
  • TLS in the HTTP layer is enabled also by default. Behind the scenes, the elastic-operator will create self-signed certificates to be used. We disable this for simplicity as all our intra services communication is private and walled behind GCP firewall policy.
  • Now, we run kubectl apply -f elasticsearch.yaml, and voila!
  • We can see the cluster object (note that es is a custom resource definition coming from the all-in-one.yaml that we applied at the first step)
$ kubectl get es 
NAME HEALTH NODES VERSION PHASE AGE
search-prod green 3 7.11.1 Ready 48d
  • We can also see the ES nodes running as pods
$ kubectl get pods | grep search-prod-es 
search-prod-es-default-0 1/1 Running 0 5h25m
search-prod-es-default-1 1/1 Running 0 5h26m
search-prod-es-default-2 1/1 Running 0 5h28m
  • We can see the StatefulSet and Service created as well. Since we use LoadBalancer as the service, we can call our pods by the external IP directly, in this case 10.148.0.117:9200. If we are only using ClusterIP, we can use kubectl port-forward service/search-prod-es-http 9200 and hit localhost:9200.
$ kubectl get statefulset | grep search-prod-es 
search-prod-es-default 3/3 48d
$ kubectl get service | grep search-prod-es
search-prod-es-default ClusterIP None <none> 9200/TCP 48d
search-prod-es-http LoadBalancer 10.31.243.139 10.148.0.117 9200/TCP 48d
search-prod-es-transport ClusterIP None <none> 9300/TCP 48d
  • The elastic-operator also creates a bunch of Secrets for each cluster that we deploy.
$ kubectl get secret | grep search-prod-es 
search-prod-es-default-es-config Opaque 1 48d
search-prod-es-default-es-transport-certs Opaque 7 48d
search-prod-es-elastic-user Opaque 1 48d
search-prod-es-http-ca-internal Opaque 2 48d
search-prod-es-http-certs-internal Opaque 3 48d
search-prod-es-http-certs-public Opaque 2 48d
search-prod-es-internal-users Opaque 2 48d
search-prod-es-remote-ca Opaque 1 48d
search-prod-es-transport-ca-internal Opaque 2 48d
search-prod-es-transport-certs-public Opaque 1 48d
search-prod-es-xpack-file-realm Opaque 3 48d
  • Next step is to deploy Kibana, which is simpler than the Elasticsearch itself. We create a kibana.yaml.
apiVersion: kibana.k8s.elastic.co/v1
kind: Kibana
metadata:
name: search-prod
namespace: sg-prod
spec:
version: 7.11.1
count: 1
elasticsearchRef:
name: search-prod
podTemplate:
metadata:
namespace: sg-prod
labels:
env: prod
spec:
tolerations:
- key: elastic-only
operator: Equal
value: "true"
effect: NoSchedule
containers:
- name: kibana
resources:
requests:
memory: 1Gi
cpu: 500m
limits:
memory: 15Gi
http:
tls:
selfSignedCertificate:
disabled: true
  • The most important part here is the elasticsearchRef to be filled with the name in the metadata.name in the elasticsearch.yaml. Under the hood, kibana will actually create a Deployment instead of StatefulSet, since Kibana doesn’t need any persistent storage.
  • Deploy with kubectl apply -f kibana.yaml, and we can see the kibana running.
$ kubectl get po | grep search-prod-kb 
search-prod-kb-df8555ff7-vhp5h 2/2 Running 0 6d13h
$ kubectl get deployment | grep search-prod-kbsearch-prod-kb 1/1 1 1 45d$ kubectl get service | grep search-prod-kbsearch-prod-kb-http ClusterIP 10.31.243.93 <none> 5601/TCP 45d

Rolling Upgrade with ECK

At this point, our Elasticsearch cluster is already up and running. Let’s step back to addressing our initial problem, how does ECK manage to perform rolling upgrades to the nodes automatically?
The answer is using Update Strategy. Since we did not provide the updateStrategy in our elasticsearch.yaml, it will be defaulted to:

spec:
updateStrategy:
changeBudget:
maxSurge: -1
maxUnavailable: 1
  • maxUnavailable defaults to 1, which means this ensures the cluster has no more than 1 unavailable Pod at any point of time.
  • maxSurge defaults to -1, which is unbounded, all Pods can be recreated immediately.

With these settings, when we change the Elasticsearch YAML file that requires the Pod to restart, we are guaranteed to have only 1 Pod restarting at a time. Our cluster’s health will be marked as Yellow since we will have some missing replica shards (from the terminated node), but it is still enough to serve requests.

Here’s a deeper look into the elastic-operator log after we update the elasticsearch.yaml file

{“log.level”:”info”,”@timestamp”:”2021–07–15T02:59:38.071Z”,”log.logger”:”driver”,”message”:”Disabling shards allocation”,”service.version”:”1.4.0+4aff0b98",”service.type”:”eck”,”ecs.version”:”1.4.0",”es_name”:”search-prod”,”namespace”:”sg-prod”}
{“log.level”:”info”,”@timestamp”:”2021–07–15T02:59:38.115Z”,”log.logger”:”driver”,”message”:”Requesting a synced flush”,”service.version”:”1.4.0+4aff0b98",”service.type”:”eck”,”ecs.version”:”1.4.0",”es_name”:”search-prod”,”namespace”:”sg-prod”}
{“log.level”:”info”,”@timestamp”:”2021–07–15T02:59:38.491Z”,”log.logger”:”driver”,”message”:”synced flush failed with 409 CONFLICT. Ignoring.”,”service.version”:”1.4.0+4aff0b98",”service.type”:”eck”,”ecs.version”:”1.4.0",”namespace”:”sg-prod”,”es_name”:”search-prod”}
{“log.level”:”info”,”@timestamp”:”2021–07–15T02:59:38.491Z”,”log.logger”:”driver”,”message”:”Deleting pod for rolling upgrade”,”service.version”:”1.4.0+4aff0b98",”service.type”:”eck”,”ecs.version”:”1.4.0",”es_name”:”search-prod”,”namespace”:”sg-prod”,”pod_name”:”search-prod-es-default-0",”pod_uid”:”ed2bce02-dd3d-4c
d9–97ed-af190d69935e”}

We can see that the operator is disabling shards allocation and requesting for a synced flush, exactly what is recommended by the document when we need to perform manual rolling upgrade. The same operations are then repeated for the rest of the pods (not shown here). No more human interventions needed on rolling upgrade 🎉

Summary

Running Elasticsearch on Kubernetes allows developers/admins to utilize container orchestration by Kubernetes and apply best practices on managing Elasticsearch clusters by the Elastic Operator. While it adds another level of complexity by running everything on k8s, it can greatly simplify manual operations needed by humans and offers a peace of mind to the developers.

References/Inspirations:

--

--