Installing Elasticsearch on GKE with the Bitnami helm Chart Part 1

Dynabolt Infrastructure Solutions
10 min readMay 30, 2023

--

Elasticsearch is a highly scalable and distributed search and analytics engine built on top of the Apache Lucene library. It is designed to handle large volumes of data and provide real-time search, analysis, and data exploration capabilities. Elasticsearch is widely used for various use cases, including application search, log analysis, full-text search, and business intelligence.

Google Kubernetes Engine (GKE) is a managed container orchestration platform provided by Google Cloud. It allows users to deploy, manage, and scale containerized applications using Kubernetes, an open-source container orchestration system.

In this tutorial, we will guide you through the process of installing Elasticsearch on GKE using the Bitnami Helm chart, while leveraging a custom values.yaml file to tailor the deployment to fit within the GKE environment. There are a few gotchas that I ran into while doing this myself so we will cover these items during this tutorial. Part 1 will cover the cluster layout and initial configuration.

Prerequisites: Before you begin, ensure you have the following:

  1. Gcloud installed and authorized on your local machine
  2. Helm3 Installed on your local machine
  3. A GKE cluster with workload identity enabled
  4. This tutorial assumes the use of the free Elasticsearch version, hence some features like fips for security will not be used

Cluster Layout

The GKE cluster acts as the underlying infrastructure where Elasticsearch is deployed. I will have a more detailed write-up in the future on the best and most secure ways to deploy a GKE cluster. For now, we will focus on Elasticsearch. The Elasticsearch cluster consists of three types of nodes: the Master Node(s), the Coordinating Node(s), and the Data Nodes.

The Master Node(s) are responsible for managing the cluster state, coordinating operations, and managing indices and shards. It ensures the cluster’s stability and availability. It’s important to note that while having multiple Master Nodes can increase fault tolerance, only one node can be the elected master at any given time. Other eligible Master Nodes remain in standby mode, ready to take over if the current master fails or becomes unreachable.

The Data Nodes store and manage the actual data within the Elasticsearch cluster. They handle indexing, searching, and querying operations. Multiple Data Nodes can be deployed to achieve horizontal scalability and distribute the data across the cluster. Proper configuration and sizing of Data Nodes are crucial for optimizing performance, storage capacity, and fault tolerance in an Elasticsearch cluster. Factors like the number of nodes, shard allocation settings, and data distribution strategy need to be considered based on the specific requirements of your application and data volume. I personally ran into issues with this which we will cover later on in the article.

The Coordinating Nodes (also known as coordinating-only nodes) play a crucial role in the operation of an Elasticsearch cluster. They act as a gateway or entry point for client requests and help distribute the load among the different nodes in the cluster. Coordinating nodes do not hold any data or participate in the data storage process, but they coordinate search and indexing operations on behalf of clients.

Modifying the Values.YAML file for GKE

We will start from the beginning, using best practices from a security standpoint. This is going to take some time so grab some coffee and strap in. Pull up the values.yaml file in your favorite editor and go to line 26. We are going to deploy Kibana with Elasticsearch here so we will need to enable this.

kibanaEnabled: false -> True

On line 74 change your cluster name to match your intended environment. ie; Dev, Stage, or Prod. For this example, we will be using dev-elastic.

clusterName: dev-elastic

A few maintenance items across the cluster, for my dev clusters I use the “latest” tag for each of the images. This allows me to test newer cluster versions on the fly before moving them to Stage and Prod. If something breaks, oh well I can quickly roll back and revert to snaps. Don’t worry, we will cover this as well later on.

image:
registry: docker.io
repository: bitnami/elasticsearch
tag: 8.8.0-debian-11-r0 -> latest
digest: ""

Now, Elasticsearch is a beast, my recommendation from experience is to make sure to have a dedicated node pool for it to run on. But, we need a way to assign the Elasticsearch pods to this dedicated node pool and make sure no other pods get scheduled to it as well. We will be using Node Taints to achieve this. First, on your new node pool

  1. On the cluster details page, click on the “Node Pools” tab.
  2. Locate the specific node pool and click on its name to open the node pool details.
  3. In the node pool details, click on the “Edit” button to modify the node pool configuration.
  4. Scroll down to the “Node Taints” section and click on the “Add Taint” button.
  5. Specify the key-value pair for the taint. For example, you can set the key as “key” and the value as “value” to create a taint of the form “key=value” (you can make these whatever you want). You can also choose the taint effect, such as NoSchedule, PreferNoSchedule, or NoExecute. We will choose NoSchedule.
  6. Click on the “Save” button to apply the changes to the node pool.

Once the taint is set, it will be applied to all the nodes in the node pool.

Next, in our values.yaml file we will need to add what are called tolerations so that the Elasticsearch pods are allowed to be deployed to our newly tainted node pool. You will need to do this for all of the pod deployments. Lines 589, 896, 1203, 1480, and 1912.

tolerations:
- key: "key"
operator: "Equal"
value: "value"
effect: "NoSchedule"

Yes, this is a little unintuitive, but this is how you allow the pods to run on your custom node pool.

Finally, to tell the pods what node pool they need to deploy to we will add the following annotations to the nodeselector on lines 585, 892, 1199, 1476, and 1908;

nodeSelector: {cloud.google.com/gke-nodepool: "your-node-pool-name-here"}

The next item we need to set globally is to enable security on the cluster. This can be tricky but security by default is key in the times we live in. Starting on line 211 we need to change a few things.

security:
## @param security.enabled Enable X-Pack Security settings
##
enabled: false -> true
## @param security.elasticPassword Password for 'elastic' user
## Ref: https://github.com/bitnami/containers/tree/main/bitnami/elasticsearch#security
##
elasticPassword: "keepitsecretkeepitsafe" -> (create a password here)
## @param security.existingSecret Name of the existing secret containing the Elasticsearch password and
##
existingSecret: ""
## FIPS mode
## @param security.fipsMode Configure elasticsearch with FIPS 140 compliant mode
## Ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/fips-140-compliance.html
##
fipsMode: false
## TLS configuration
##
tls:
## @param security.tls.restEncryption Enable SSL/TLS encryption for Elasticsearch REST API.
##
restEncryption: true
## @param security.tls.autoGenerated Create self-signed TLS certificates.
## NOTE: If autoGenerated certs are enabled and a new node type is enabled using helm upgrade, make sure you remove previously existing Elasticsearch TLS secrets.
## Otherwise, the new node certs won't match the existing certs.
##
autoGenerated: false -> true
## @param security.tls.verificationMode Verification mode for SSL communications.
## Supported values: full, certificate, none.
## Ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/security-settings.html
##
verificationMode: "full"
## @param security.tls.master.existingSecret Existing secret containing the certificates for the master nodes
## @param security.tls.data.existingSecret Existing secret containing the certificates for the data nodes
## @param security.tls.ingest.existingSecret Existing secret containing the certificates for the ingest nodes
## @param security.tls.coordinating.existingSecret Existing secret containing the certificates for the coordinating nodes
##
master:
existingSecret: ""
data:
existingSecret: ""
ingest:
existingSecret: ""
coordinating:
existingSecret: ""
## @param security.tls.keystoreFilename Name of the keystore file
##
keystoreFilename: elasticsearch.keystore.jks
## @param security.tls.truststoreFilename Name of the truststore
##
truststoreFilename: elasticsearch.truststore.jks
## @param security.tls.usePemCerts Use this variable if your secrets contain PEM certificates instead of JKS/PKCS12
## Ignored when using autoGenerated certs.
##
usePemCerts: false -> true

Part 2 of the security portion is to enable it in our Kibana section of the chart. Starting on line 2244 we will need to add the following;

kibana:
elasticsearch:
security:
auth:
enabled: true
elasticsearchPasswordSecret: dev-elasticsearch
# default in the elasticsearch chart is elastic
createSystemUser: true
kibanaUsername: "<USERNAME>"
kibanaPassword: "<PASSWORD>"
tls:
# Instruct kibana to connect to elastic over https
enabled: true
# Bit of a catch 22, as you will need to know the name upfront of your release (which in our case we do)
existingSecret: dev-elasticsearch-coordinating-crt
# As the certs are auto-generated, they are pemCerts so set to true
usePemCerts: true

Speaking of security, we need to enable Anonymous Access in order to get our health check to work. (More on this in a second) This just allows us to check the cluster health without having to authenticate. On line 101 add the following;

## ref: https://www.elastic.co/guide/en/elasticsearch/reference/current/settings.html
##
extraConfig:
xpack.security.authc:
anonymous:
username: anonymous_user
roles: viewer, remote_monitoring_agent, monitoring_user
authz_exception: true

Next, if you have any plugins that you need to install you can add them on line 84;

plugins: ""

So the next one was a fun one to figure out. For this deployment, both Elasticsearch and Kibana will be using an internal ingress controller to connect users and apps to the cluster and the Kibana dashboard. This allows only trusted apps and users on dedicated machines to connect. GKE deploys a Load Balancer with the ingress controller to route and manage the internal traffic. This Load Balancer requires a health check to function. One problem though, when auth is enabled, Elasticsearch returns a 401 instead of the required 200. They do not allow a TCP health check either. Bummer, so what to do? After much toil, I have a solution. Kibana is an easy one, just set the backend health check path to /login, and that’s it. In order to do this we will need to add a backendconfig to our deployment.

apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
name: kibana-backendconfig
spec:
healthCheck:
type: HTTP
requestPath: /login
port: 5601

Once you create this backend config you simply type;

kubectl apply -f kibana-backendconfig -n elasticsearch (assuming you already created the elasticsearch namespace) (make sure to have your cmd prompt open to the path where the backendconfig is stored.)

Next, we need to add the reference for the backendconfig to the helm chart, this entire section needs to be added to the Kibana portion of the helm chart. It is the portion under “annotations”. Don't worry, I will include the complete chart for reference later.

service:
## @param service.ports.http Kubernetes Service port
##
ports:
http: 5601
## @param service.type Kubernetes Service type
##
type: ClusterIP
## @param service.nodePorts.http Specify the nodePort value for the LoadBalancer and NodePort service types
## ref: https://kubernetes.io/docs/concepts/services-networking/service/#type-nodeport
##
nodePorts:
http: ""
## @param service.clusterIP %%MAIN_CONTAINER_NAME%% service Cluster IP
## e.g.:
## clusterIP: None
##
clusterIP: ""
## @param service.loadBalancerIP loadBalancerIP if Kibana service type is `LoadBalancer`
## ref: https://kubernetes.io/docs/user-guide/services/#type-loadbalancer
##
loadBalancerIP: ""
## @param service.loadBalancerSourceRanges %%MAIN_CONTAINER_NAME%% service Load Balancer sources
## ref: https://kubernetes.io/docs/tasks/access-application-cluster/configure-cloud-provider-firewall/#restrict-access-for-loadbalancer-service
## e.g:
## loadBalancerSourceRanges:
## - 10.10.10.0/24
##
loadBalancerSourceRanges: []
## @param service.externalTrafficPolicy Enable client source IP preservation
## ref https://kubernetes.io/docs/tasks/access-application-cluster/create-external-load-balancer/#preserving-the-client-source-ip
##
externalTrafficPolicy: Cluster
## @param service.annotations Annotations for Kibana service (evaluated as a template)
## This can be used to set the LoadBalancer service type to internal only.
## ref: https://kubernetes.io/docs/concepts/services-networking/service/#internal-load-balancer
##
annotations:
cloud.google.com/backend-config: '{"default": "kibana-backendconfig"}'
## @param service.labels Extra labels for Kibana service
##
labels: {}
## @param service.extraPorts Extra ports to expose in the service (normally used with the `sidecar` value)
##
extraPorts: []
## @param service.sessionAffinity Session Affinity for Kubernetes service, can be "None" or "ClientIP"
## If "ClientIP", consecutive client requests will be directed to the same Pod
## ref: https://kubernetes.io/docs/concepts/services-networking/service/#virtual-ips-and-service-proxies
##
sessionAffinity: None
## @param service.sessionAffinityConfig Additional settings for the sessionAffinity
## sessionAffinityConfig:
## clientIP:
## timeoutSeconds: 300
##
sessionAffinityConfig: {}

Now that that is added we need to make a backendconfig for the Elasticsearch Ingress Controller. This one is a bit different, we will be calling the health path of the cluster for a solid health check. This is where the anonymous user comes in, since we don’t have to authenticate for this information, it returns our 200. Boom, problem solved.

apiVersion: cloud.google.com/v1
kind: BackendConfig
metadata:
name: elastic-backendconfig
spec:
healthCheck:
type: HTTPS
requestPath: /_cluster/health
port: 9200

Again, you will need to apply and then reference this backendconfig in your values.yaml. Just make sure during your apply you use the correct name (elastic-backendconfig). Then, on line 380 add the following;

  ## @param ingress.annotations Additional annotations for the Ingress resource. To enable certificate autogeneration, place here your cert-manager annotations.
## Use this parameter to set the required annotations for cert-manager, see
## ref: https://cert-manager.io/docs/usage/ingress/#supported-annotations
## e.g:
## annotations:
## kubernetes.io/ingress.class: nginx
## cert-manager.io/cluster-issuer: cluster-issuer-name
##
annotations:
cloud.google.com/backend-config: '{"default": "elastic-backendconfig"}'

Now we need to make a service account and associate it with a GCP service account so that we can use workload identity for things like VPC service controls and other access. Now I am not going to reinvent the wheel here because GCP has really good documentation on this already. Once you have created your service account and annotated it to the GCP service account you can add it on lines 757, 1064, 1332, and 1609;

## Pods Service Account
## ref: https://kubernetes.io/docs/tasks/configure-pod-container/configure-service-account/
## @param master.serviceAccount.create Specifies whether a ServiceAccount should be created
## @param master.serviceAccount.name Name of the service account to use. If not set and create is true, a name is generated using the fullname template.
## @param master.serviceAccount.automountServiceAccountToken Automount service account token for the server service account
## @param master.serviceAccount.annotations Annotations for service account. Evaluated as a template. Only used if `create` is `true`.
##
serviceAccount:
create: false
name: "your service account name"
automountServiceAccountToken: true
annotations: {}

That concludes the tutorial for part 1. Now that we have the global settings configured and groundwork laid for the deployment next time we will be getting into the detailed configuration for the networking side of things then we will move to performance tips. Stay tuned for part 2 next week.

--

--