Kubernetes — Things you should know

Internal working mechanism of Kubernetes

Published in

TechMintMedia

7 min readApr 2, 2022

I have been working on Kubernetes and deploying cloud-native apps for over 6 years now. Being at both ends of Cloud-Native Apps where I design/develop and deploy. Seen the transition from the most difficult way of on-prem K8s deployment to managed services like EKS and AKS. Here I am sharing my experience and things you should know about Kubernetes.

This is the Part-1 of this series. I will be releasing around 4 more articles on this topic.

This content targets both cloud and on-prem Kubernetes deployment. Some content may not required for services like AKS, EKS, as internal things are managed by AWS and Azure.

Fact: Kubernetes is also referred to as K8s, not k8s :) — this was just a fun fact.

What we are going to cover in this article

Fun with K8s
Any K8s playground which I can use to play with K8s
Vendors who can help to kick-start Kubernetes for your new Product/application
etcd database of K8s and how it’s effective
Kubernetes API life cycle
CRI and K8s is dropping docker-engine support
Node affinity and anti-affinity
Forum

#1 Fun with K8s

Can I visualize my K8s cluster in 3D :) Yes you can, watch out the demo here (Please don’t use it for production)

https://afritzler.github.io/kube-universe/web/demo/

#2 From where I can play with Kubernetes

Online playground

Offline single node K8s ( you need to install)

#3 Can I use Kubernetes for my Apps deployment without knowing it

There are platforms like OpenShift and cloud foundry which are built on top of Kubernetes. This platform gives you an abstraction layer for engineers to focus on engineering problems and App deployment rather than focusing on and knowing what is Kubernetes and how to configure it. But these platforms are built heavily on K8s.

If you don’t have Kubernetes engineers in your team, you can take help from the following list of companies listed on CNCF. Who can help you doing K8s setup ( of course they will charge you for their service) https://www.cncf.io/certification/kcsp/

#4 Kubernetes Database

Yes, Kubernetes as an orchestration tool internally used a Database to store information. When you create an Object like Deployment, Pod, Service, Secrets — all such information are stored in etcd database. When you execute kubectl command, kubectl command talks to KubeAPI and KubeAPI request to binary ( internal K8s software ), then those binary extract information from etcd.

etcd is a strongly consistent, distributed key-value store that provides a reliable way to store data that needs to be accessed by a distributed system or cluster of machines. It gracefully handles leader elections during network partitions and can tolerate machine failure, even in the leader node

etcd Benchmarked at 1000s of writes/s per instance

Do you want to understand how practically etcd works ? let’s use this website playground to do some write, Read and Delete operation http://play.etcd.io/play

#5 Kubernetes API life cycle

The core of Kubernetes’ control plane is the API server. The API server exposes an HTTP API that lets end users, different parts of your cluster, and external components communicate with one another.

The Kubernetes API lets you query and manipulate the state of API objects in Kubernetes (for example: Pods, Namespaces, ConfigMaps, and Events).

Have you seen this ? for few objects you write alpha1 and for few write v1 and for others you write beta1

apiVersion: kubelet.config.k8s.io/v1alpha1

apiVersion: v1

When Kubernetes add new features, mostly it never release it as GA (General available), rather it goes through API life-cycle called alpha, beta, betav1, betav2 and GA (v1, v2 etc).

Each time you are upgrading kubernetes, you need to look at your YAML files, in-case any alpha API promoted to beta. Or any beta api promoted to GA. Based on that you need to make changes to your YAML files ( RAW K8s Object YAML files or Helm Chart ) to point to correct API version. As there might be cases that, K8s might deprecate the API it-self.

#6 CRI: Container Runtime Interface

Container is not all about Docker. There are a lot of container solution available, which can be used as Container engine/runtime for Kubernetes.

Out-of-box kubernetes is having a API layer called CRI, which promotes any container engine/runtime which can talk to K8s via CRI ( by means protobuf and grpc ) — kubernetes is going to support it. Kubelet communicates with the container runtime (or a CRI shim for the runtime) over Unix sockets using the gRPC framework, where kubelet acts as a client and the CRI shim as the server.

There are container runtime/engines like Docker, RKT , CRI-O, containerd. But still Docker is the default runtime for K8s till v1.23 and CRI is still under Alpha.

Starting from K8s 1.24, docker-shim is deprecated, means you can’t use Docker engine starting from v1.24 of K8s. So what’s next ?

For your local docker build and docker run you can use Docker engine. Also in your DevOps pipeline you can use Docker engine. But in actual kubernetes cluster, you can’t use Docker are runtime starting K8s v1.24.

There is a company called mirantis, who said it will maintain dockershim for kubernetes along with support from Docker. This OpenSource project will keep dockershim alive. https://github.com/Mirantis/cri-dockerd

If you are confused with, what is docker-shim, then docker-shim is a module in kubernetes which allows kubernetes to support docker runtime as container engine. But now kubernetes is removing that module from v1.24. So you need to either install cri-dockerd by your-self or use other engine like RKT or CRI-O.

#7 Node affinity and anti-affinity

Remember, life is easy until we have just 2~3 worker nodes. Once we will scale to 10 or 20 worker nodes and you are dealing with distributed app deployment then node affinity is very vital.

Do we really need to control pod schedule to particular nodes? well, the answer is YES.

The special type of hardware requirement for specific pods. Imagine a situation where we have a powerful computing worker node with GPU and SSD. You want your Machine learning intensive pods to be scheduled on those high-power nodes. Or a pod responsible for high CPU/GPU computation like game engine. Also non ML pods not be scheduled to those Nodes. In that case you need Node affinity and anti affinity.
You want to deploy your kafka brokers in all available Availability zones on AWS for HA and fault-tolerant. Imagine a situation, where you are running 5 nodes on AWS AZ-1 and 5 nodes on AZ-2, you want 5 replication factors of your Kafka broker pods. There might be a case that, all those 5 pods will be scheduled on AZ-1 5 nodes and nothing will be scheduled on AZ-2. So even if you have an Availability zone setup, but due to the wrong strategy, your HA will fail in case of AZ-1 failure from the AWS side.

How many ways we can do this

Node selector ( easy but old way, define a key-value on node and pods should have that key-value in pod spec )
node affinity and pod affinity (This is the ability of pod to choose)
Taints and Toleration (This is the ability of nodes to reject pods )

Node Selector way:

kubectl label nodes node-abcd-name workerType=ssdgpu

apiVersion: v1
kind: Pod
metadata:
  name: nginx
  labels:
    env: test
spec:
  containers:
  - name: nginx
    image: nginx
    imagePullPolicy: IfNotPresent
  nodeSelector:
    workerType: ssdgpu

So first we assign a custom key/value pair to worker node. Here we assigned as workerType as ssdgpu. Then in our Pod definition, we said, it should select worker having label as workerType = ssdgpu.

Node affinity (new way)

While defining pod Spec in YAML, you need to include nodeAffinity and shortly there are 2 types of affinity rule, 1st one is hard, if that doesn’t match then pod will not be scheduled to any worker nodes. soft affinity is like, this can be ignored by K8s scheduler if it doesn’t matches.

requiredDuringSchedulingIgnoredDuringExecution (hard rule)
preferredDuringSchedulingIgnoredDuringExecution. (soft rule)
requiredDuringSchedulingRequiredDuringExecution (hard rule)

spec
  affinity:
    nodeAffinity:
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
        - matchExpressions:
          - key: az-name
            operator: In
            values:
            - az1
            - az2
      preferredDuringSchedulingIgnoredDuringExecution:
      - weight: 1
        preference:
          matchExpressions:
          - key: worker-hardware-type
            operator: In
            values:
            - ssd-gpu

In the above example you can see, i am asking pod to be scheduled in availability az1 or az-2. It should be not scheduled to any other availability zone like az3 or az4. This was hard rule.

In the soft part i am asking to prefer worknode of type ssd-gpu with hard rule as either az1/az2.

Taints and Toleration:

This is bit advanced way of node selection, usually this is not widely followed.

$ kubectl taint nodes node1 key1=value1:NoScheduletolerations
- key: "key1"
  operator: "Equal"
  value: "value1"
  effect: "NoSchedule":

First you need to configure worker node with the taint with key-value pair. This can simply done using kubectl command. Then in the pod spec you have to define the toleration. Any pod matching toleration of the node’s taint, then pod will be scheduled. One worker node can have multiple taints and any pod wants to be scheduled in that worker node, they have to match all the taints as part of tolerations spec.

Where to ask K8s questions:

https://discuss.kubernetes.io/ and of-course stack overflow.

Thanks for reading :)