Kubernetes 101: A beginner’s guide to container orchestration

A high level introduction to Kubernetes and it’s components

Harshita Sharma

Published in

Accredian

9 min readSep 15, 2023

Introduction

If you’re new to the world of container management and have ever wondered how leading tech companies effortlessly deploy and scale their applications, Kubernetes might just be the thing you’re looking for.

Think of Kubernetes as the traffic controller for your containerized applications. It ensures they run harmoniously, scaling up when needed and self-healing when things go awry.

Kubernetes(or K8s) was developed by Google and now maintained by the Cloud Native Computing Foundation (CNCF), it has become the de facto standard for container orchestration due to its robust features and active community support.

It’s an open-source container orchestration platform that simplifies the deployment, scaling, and management of containerized applications, which basically means that kubernetes helps you manage applications that are made up of hundreds or maybe thousands of containers and it helps you manage them in different environments like physical machines, virtual machines or cloud environments or even hybrid deployment environments.

What problems does Kubernetes solve?

The rise of microservices in almost every domain has also increased the usage of container technology to manage them and actually offer the perfect host for small independent applications.

Containers are a critical component in microservices architectures. They provide isolation, portability, and resource efficiency, allowing each microservice to run in its own self-contained environment with consistent behavior across different stages of development and deployment.

A Containerization illustration using Docker

Their integration with orchestration platforms like Kubernetes further enhances the management of microservices by automating deployment and scaling processes.

Features offered by orchestration tools, in our case Kubernetes:

High availability, that simply means there’s no downtime in your application, it’s always available to the user
Scalability or high performance based on the load i.e the number of people accessing your application
Disaster Recovery, i.e the platform can help backup and restoration of data in case of emergencies. The container can run from the latest state after the recovery without losing information

The architecture of Kubernetes

To understand the architecture distinctly, lets take a look at different components and how are they interlinked.

Nodes:

The fundamental unit of Kubernetes is a node. A node is basically a physical or virtual machine. Therefore a node helps you to understanding a machine as just a CPU and RAM resources that can be utilized.

Cluster:

Kubernetes operates in clusters. As the name suggests, a cluster is a combination of nodes and as node is the most basic unit, a cluster is made out of atleast one node.

What a cluster does is pools all it’s resources together and works kind of like a hivemind. If there’s a discrepancy in one node, the cluster takes care of that and doesn’t let it affect the application.

As already established, it’s made of atleast one master node (or control plane), and connected to it are the worker nodes ( or simply nodes). These consist of a Kubelet, which is the primary node agent, i.e it helps clusters to communicate to each other and running different application processes.

The worker node is where all the applications are deployed and actual work happens. It has a different number of containers deployed and running depending upon how the workload is distributed.

So what does the master node do?

The master node runs several kubernetes processes that are absolutely necessary to run and manage the cluster properly. These processes are:

API Server: The API server is the central component that exposes the Kubernetes API. It handles incoming RESTful API requests, performs authentication, authorization, validation, and triggers actions accordingly. Clients interact with the API server to manage resources in the cluster.
etcd: It’s a highly available distributed key-value store used as Kubernetes’ backing store for all cluster data, including configuration settings, object metadata, and current state. It is a crucial component for maintaining cluster consistency and durability.
Controller Manager: This is responsible for running controller processes that regulate the state of the system. Controllers include the Replication Controller (for managing replicas of pods), the Node Controller (for monitoring node health) etc.
Scheduler: The Scheduler watches for newly created pods with no assigned node and decides which node they should run on. It takes factors like resource requirements, node affinity, and other policies into account to make placement decisions.

Therefore worker nodes have a higher workload and require large amount of resources but control plane nodes (master nodes) are much more important.

This is the reason that in real life production environments, we generally have atleast 2 master nodes, so that if one fails, the entire system won’t come crumbling down.

Pods:

The smallest unit of a node in Kubernetes is a pod. Kubernetes doesn’t run containers directly, it wraps one or more containers into a higher-level structure called a pod.

Any containers in the same pod will share the same resources and local network. Containers can easily communicate with other containers in the same pod as though they were on the same machine while maintaining a degree of isolation from others.

Kubernetes adds this abstraction layer so that you can replace them if you want to and also so that you don’t have to directly work with different containerization technology like docker etc. You only interact with the Kubernetes layer.

Pod runs applications container inside of it. Although it can handle multiple containers, the scalability and resources should be kept in mind.

In cases of high loads, to avoid discrepancies, Pod Replicas are used to achieve high availability, load balancing, and scalability for an application or service running in a Kubernetes cluster. When you create multiple replicas of a pod, you’re essentially creating multiple instances of that pod, each running the same set of containers and configurations.

The pods communicate with each other through a virtual network, where each pod gets it’s own IP Address.

Service & Ingress

As we just established that each pod gets it’s own IP address, it becomes really inconvenient where we have to adjust it eveytimes a pod dies and replaced with a new pod.

Service solves this problem. It’s kinda like a permanent IP Address of each pod. As the lifecycles are not connected in service, even if a pod dies and replaced with a replica, it’ll have the same service.

The endgame of an application is to be able to accessed through a browser and for that we use the external service, but if you notice the URL of the external service is not very practical. What you have is an http protocol with a node ip address so of the node not the service and the port number of the service, which is okay for checking if your application works but not for the end user.

What we want is a secure protocol and a domain name and for that there is another component of kubernetes called Ingress so instead of service the request goes first to ingress and it does the forwarding then to the service.

ConfigMap and Secret

A ConfigMap is a Kubernetes resource used to store configuration data in the form of key-value pairs.

This data can be used by pods as environment variables or mounted as files in a volume. ConfigMaps are commonly used for non-sensitive configuration information that your application needs to run but doesn’t require encryption or special protection.

You can have different ConfigMaps for different environments.

A Secret is similar to a ConfigMap but is designed to store sensitive information, such as passwords, API tokens, or TLS certificates. Secrets are base64-encoded and can be mounted as files or used as environment variables in pods. Kubernetes automatically encodes and decodes secrets as needed.

Volume

A very important aspect when it comes to Kubernetes is the Data Storage.

Suppose you have an application pod which is connected to a database pod. If due to some reason your database pod is restarted, the data would be gone. This will create a huge problem as we want our data to be persistent.

This is where volumes comes in. It basically attaches a physical storage on a hard drive to your pod and that storage could be either on a local machine meaning on the same server node where the pod is running or it could be on a remote storage meaning outside of the kubernetes cluster it could be a cloud storage or it could be your own premise storage which is not part of the kubernetes cluster.

Deployment & StatefulSet

We established that pods can have multiple replicas in cases of discrepancies, but how many replicas should be running at a time?

This is where deployment comes in. As pods are an abstraction layer on containers, we can consider deployment as an abstraction layer over the pods.

In very simpler terms, a deployment allows you to describe the desired state of an application and how many replicas of it should be running at any given time.

Lets take the same previous example, suppose we have a node with an application pod and a database pod. If the application pod dies, it’ll increase the downtime which is really the worst thing in production. In order to keep our application running, instead of relying on one pod, we replicate everything on multiple servers. The replica will have the same thing, connected to the same service.

For creating the replica of the pods, we don’t just create a new pod, we define a blueprint of how many replicas we want. This blueprint is called Deployment. In practice, we don’t work with pods, we create deployments and they specify the number of replicas that are needed.

So now if one of the replicas of the application pod dies the service will forward the requests to another one so our application would still be accessible for the user.

So what about the database pod? Our application will not be accesible if the database pod dies as well but the problem is we can’t replicate database using a deployment.

Database has a state which is its data, that is, if we have clones or replicas of the database they would all need to access the same shared data storage and there you would need some kind of mechanism that manages which pods are currently writing to that storage or which pods are reading from the storage in order to avoid data inconsistencies.

This mechanism in addition to replicating feature is offered by another kubernetes component called StatefulSet and is meant specifically for applications like databases(mysql, mongodb, elasticsearch etc.)

Therefore databases or any other stateful applications should be created using StatefulSets and not deployments.

Although StatefulSets should be used for the above, it’s relatively tedious and difficult. That’s why it’s also a common practice in production to host database applications outside of the kubernetes cluster and communicate with the external database.

Conclusion

Hoping that this article has scratched the surface of Kubernetes, my main objective was to make you familiar with what Kubernetes actually is and does.

There’s no denying that this container orchestration platform is incredibly versatile and can be customized to meet the unique needs of different applications and organizations, therefore it’s valuable skill to have. As we’ve covered the WHAT part, we will try and focus on the HOW part in the upcoming articles, so stay tuned!