Becoming a well rounded K8s Cloud Engineer in 10 steps

Oren Spiegel
CodeX
Published in
7 min readAug 24, 2021

If you’re here it is likely that you already understand the benefits of using Kubernettes to manage your multi-service architecture. This article will not discuss why to use Kubernettes, but rather focus on how to approach it once you’ve already determined it is the solution for you.

This article will explain HOW to approach Kubernettes, and NOT on why and when to use it

There are many Kubernettes learning resources online, which make it difficult to know where and how to start tackling learning this vast technology. This article will provide a high level curriculum, covering the different aspects of running your application’s micro-services on a K8s cluster.

I will introduce the concepts, and provide external resources for the actual learning. Let this be your one stop guide to becoming a well rounded K8s engineer.

STEP 1: Install Minikube locally and get familiar with the Control Plane components

Minikube will simulate a a multi-service cluster environment for us. The installation is quick and easy and can be found here. Minikube will get drunk on your machine resources (it is very RAM intensive) so bear this in mind.

Make sure your machine can handle Minikube

It is important to understand the role of the k8s cluster components. Before proceeding to step 2 please visit this short tutorial. Focus on the components that reside on the Control Plane (Master Node).

  1. Kube-apiserver: configures data for k8s api objects (such as pods)
  2. Kubelet: registers a node to the api-server
  3. Kube-controller-manager: a daemon that embeds k8s core control loops
  4. Kube-scheduler: determines which Nodes are valid placements for each pod
  5. Kube-proxy: directs traffic that is destined for services to the correct backend pods

STEP 2: Install and get familiar with Kubectl

The Kubernettes command line interface, or Kubectl is used to issue scheduling commands from the master Node (discussed later) or your local machine to the worker nodes on your K8s environment. To install the command line tool click here. You can learn basic commands here, but I strongly recommend you go through step 3 and learn about the different types of k8s yaml config objects before diving deeper into Kubectl.

proceed to step 3 before deep diving into Kubectl and its purpose

STEP 3: Learn about the different yaml k8s api objects

I think the most time efficient way to approach this is to learn the purpose of the core K8s config yaml api objects at a high level.

Focus on understanding the purpose of each k8s yaml object

Do not memorize the file structures or lingual, because Helm (detailed in the next step) will auto create yaml config structures. Focus on understanding the role of each of the below:

  1. deployment
  2. service
  3. service-account
  4. ingress (not to be confused with nginx-ingress-controller later discussed)
  5. hpa (horizontal pod autoscaler)
  6. ConfigMap and Secrets

STEP 4: Use Helm to write yaml definition files for your various micro services

Helm auto creates default yaml configuration files for each “chart” you create. Each “chart” represents an individual micro-service in your architecture. A “chart” directory contains all of the config files required to deploy this micro-service on the cluster. Since the micro-service was deployed using definition files, it can be altered and re-deployed with flexibility and ease. A best practice would be to push all of your helm directories into a (private) git repository, so you could later reinstall all services on a different cluster with a single command. This reusability is not offered by “on the fly” kubectl commands.

In most cases the only file that will need to be edited in each helm chart will be the values.yaml file. This file creates a “single source of truth” for each micro-service. For a full guide on how to create a Helm chart click here.

STEP 5: Build a High availability cloud setup with worker node autoscaling using KOPS

Once you have seen your Helm chart services communicate with each other successfully in Minikube, you are officially ready to set up a cloud environment. A high availability cloud setup is defined as having at least two worker nodes in different zones, each hosting your application ingress controller and services. In the event that one data-center sets on fire due to a storm, the other Node resides in an entirely different “computer farm” and will continue to serve incoming requests without interruption. Your end clients will be undisturbed. Have no worries, K8s will auto create the Node worker that was lost in the fire in a matter of minutes.

As a first step, buy a cheap, low CPU/RAM machine from your cloud provider. On this cheap machine install Kubectl (again), KOPS, and Helm. This machine will be called the Master Node and it will be responsible for interfacing, interacting, and setting up the cluster and the pods that reside within. For precise instructions please click here.

Make sure you record all of the KOPS commands in an sh script file, this way your infrastructure build process is recorded as code and can be replicated easily in case of a mistake that may require a cluster setup do over. The concept of “infrastructure as code” is very well served by the use of KOPS, and Helm.

helm and kops allow for faster “do overs” when they’re necessary

STEP 6: Setup your Nginx ingress controller, and micro-service ingress yaml to accommodate your needs

Nginx ingress controller will manage traffic to the cluster. It may be configured to give services externally-reachable URLs, load balance traffic, terminate SSL / TLS, and offer name-based virtual hosting. This guide will take you step by step on how to deploy an Nginx ingress controller with an aws LoadBalancer on it.

The ingress yaml definition file Helm auto creates is different. It provides ingress configuration that is specific to the service. The annotations field is where you define https forwarding rules, any request size limitations and timeouts, or other important configurations related to incoming request handling. These rules will often differ from service to service within your cluster, and that is why each micro-service has an ingress of its own.

STEP 7: Understand the relationship between horizontal pod autoscaling and cluster auto scaling

Some of your services may call for an hpa (horizontal pod autoscaler) yaml file defined in order to allow for autoscaling. The scheduler will auto spawn more pods according to cpu/ram thresholds you define in the deployment yaml file. Once the Node fills up its resource limits with pods, it will auto create an additional Node and resume the scheduling of pods on it. Similarly if work load is dropped on the micro-service, k8s will magically “release” or terminate pods it spawned as well as the new Node it auto created when it becomes unnecessary.

STEP 8: Use taints, tolerations, affinities, and anti-affinities to auto schedule your pods on the proper Node

“Tainting” a Node, involved placing a label on it that deployments may “tolerate” (or in other words, allow scheduling on the tainted Node) or not (disallow scheduling). Tolerations are specified in the pod spec section of a deployment yaml file, while taints are marked using a Kubectl command on the Node. The official documentation can be found here.

A similar yaml config scheduling feature is the node-selector pod specification. It gives a pod an affinity (or a liking) to a particular Node taint, or an anti-affinity (disliking) to a node taint. Node affinity is powerful when trying to schedule particular pods on Nodes with special features (high CPU, GPU, high memory). It is most commonly used to prevent pods from being scheduled on the master Node (reserved for control plane pods).

Sometimes the desire is to make sure two pods don’t auto deploy to the same machine. For this we have inter-pod affinity rules. For example, let’s say we have two pods (of the same deployment/replica-set), and each require 70% of the node’s CPU. In this case we can only run one pod per node, two pods would cause a CPU overload. An easy solution that does not require resource tracking would be to place a pod anti-affinity to itself on the deployment. This would achieve one pod per worker node relationship.

now he can be scheduled on the Node tainted “Lactose”

STEP 9: Monitoring with Prometheus and Grafana

The most basic performance monitoring tool is the metrics server. It is a basic prerequisite to using a horizontal pod autoscaler and with it you can execute a “Kubectl top” command to retrieve the CPU usage of your pods or nodes. This is important because in K8s, each deployment that is set to horizontally autoscale must have CPU/RAM requirements (and limits) defined in the yaml config. Please refer to this video for a step by step Helm installation guide of the metrics server.

Be warned, you will likely find that the metrics server does not provide enough insights in and of itself. If your services are RAM/CPU intensive you will require a visual tool that will give a precise measurement of resource usage per pod. Better monitoring means less surprises and less pods crashing due to a limit overcommit, or in technical terms OOM kill.

Avoid a K.O. by staying on top of your pod and node resource consumption

Here is detailed guide on how to install Grafana and Prometheus using helm charts. Prometheus will measure resource usage, while Grafana provides a visual interface to view the different resource metrics, on an x time axis.

STEP 10: Centralize Logging with Elastic Search and Kibana

Each pod’s live STDOUT can be viewed with the command:

kubectl logs -f <insert-pod-name>

This in itself is not enough.

The log viewer displays a portion of the latest prints the pod wrote to STDOUT. A common reason why we need to review logs is because an error caused a pod to crash. In the case of a pod crash, the logs will be wiped and there is no way to recover them or investigate the source error.

Elastic search for the rescue!

Elastic-Search collecting logs across the cluster

It is easily installed with Helm Charts, and will centralize and record all logs, of all pods in the cluster. Kibana will offer us an interface from which we could search through the logs, or alternatively narrow in on logs belonging to a particular pod or a time segment.

Farewells and such

As a closing remark I will offer a link to a paid Udemy course (that I do not sponsor, nor receive money from). It covers everything I discussed above and much more. For those who like “monkey see — monkey do” style teaching, this is the most well rounded course I came across.

--

--