Demystifying Kubernetes Objects
Understanding the what, the why, and the how
Kubernetes is now the de facto standard for container orchestration. There are many reasons for its popularity. One excellent rationale is the number of features it brings with itself, as a result of which there are a considerable number of Kubernetes objects.
In Kubernetes, there are multiple ways to achieve a result, which also causes a lot of confusion among DevOps professionals. This article seeks to address what, how, and why of some often-used Kubernetes objects.
This is an advanced level topic intended for Kubernetes practitioners. If you’re looking to start with Kubernetes, check out this Medium article:
Understanding how Kubernetes works
Kubernetes uses a simple concept to manage containers. There are master nodes (control plane) which control and orchestrate the container workloads, and the worker nodes where the containers run.
- Talks to the underlying container runtime. Kubernetes is not a container platform but a container orchestration platform. Its kubelet component, which runs as a service on every node, is responsible for communicating with the underlying container platform to manage the containers. For example, when you create a pod using kubectl and your underlying container platform is Docker, kubelet issues a Docker run command to the Docker runtime in the chosen worker node.
- Stores state of the expected configuration. When you apply a Kubernetes configuration using kubectl create/apply commands, kubernetes stores that in its
etcddatastore as an expected configuration.
- Tries to maintain state based on the expected configuration. Kubernetes keeps on trying to maintain the expected state of the cluster by looking at the expected configuration in the
- Provides an abstract software-based network orchestration layer. The pod network provided by Kubernetes ensures that containers can talk to each other on an overlay or bridge network managed within the container runtime (using docker bridge networks for example) and through an internal or external network. When a pod talks to another pod, Kubernetes modifies routing tables to ensure that the connectivity is in place.
- Provides inbuilt service discovery. Kubernetes provides service discovery of containers out of the box. You do not need an external application to manage it. Kubernetes Service exposes your Pod on a DNS which maps the service name to any available Pod IP, providing service discovery and load balancing between multiple pod replicas. It’s because of this service discovery that pods are ephemeral. Services can also expose your pods to internal and external clients by creating listeners within your nodes and also by requesting cloud providers to provision a Load Balancer to point to your pods.
- Health checks the configuration. Kubernetes ensures that the container workloads running within the cluster are of expected health, and if not, it destroys and recreates the containers.
- Talks to the cloud provider for objects. If you’re running Kubernetes within a cloud provider like GCP or Azure, it can use the Cloud APIs to provision resources like Load Balancers and Storage. This way, you have a single control plane for managing everything you would need to run your applications within containers.
Kubernetes configuration consists of Kubernetes objects such as Pods, Replication Controllers, Replica Sets, Daemon Sets, Deployments, StatefulSet, Services, Persistent Volumes, Persistent Volume Claims, etc. I’m going to demystify these, one at a time.
A pod is a fundamental building block in Kubernetes. It is a collection of one or more containers. Every time you need to run an application, you would need to create a Pod. Usually, a pod runs only one container, but there are instances when you may need to run more than one container in the Pod. Typical use cases include, but are not limited to, the following:
- Init Containers. Sometimes, it’s necessary that before starting a container, certain pre-requisites are met, such as checking if a volume exists and if it does it making sure it has the correct ownership and permission.
- Helper/Internal containers. If there is a helper container that runs with the main container and meets a common goal, it would make sense to run them in a single pod. Running Helper containers within the same Pod would ensure they run in localhost and therefore is faster than keeping them in separate pods which might separate them in two worker nodes. Kubernetes runs all containers of a pod in the same worker node.
Below is an example of a pod manifest with a single container:
Below is an example of a pod manifest with
The configuration above fires up an
nginx container with a
busybox init container. When you apply this manifest, Kubernetes ensures that before it starts the
nginx pod, all files and directories within the
/etc/nginx persistent volume would have the
uid 101 and
gid 101 as the owner (
101 is the
gid of the
If you deploy a pod on its own, replicating it and scaling it is not possible. It exists on its own and once you delete it using a
kubectl command, it’s gone. To ensure that a minimum number of pods are always running, we need to use an object like a Replication Controller. Replication Controllers are a legacy form of replicating pods in Kubernetes and has now been replaced by the Replica Set. However, I would still show an example of it. To achieve this, we just need to wrap the replication controller spec over the pod spec:
The replication controller scales Pod by searching on the pod labels. The selector labels exactly match the pod labels. The replicas determine the number of instances of the Pod that would run at any given time.
Replica Sets are the newer counterparts of Replication Controller. However, one thing to keep in mind is that it should not be used on its own but rather as a backend for deployment objects. They are similar to Replication controllers with the only difference being that you can use a set-based search notation to group pods rather than a named key-value pair based match. You have more dynamic selector options in case of a replica set. For example, you can write the above specification like this:
Or like this:
Deployments are one of the most widely used and recommended methods of deploying your Kubernetes Pods. They replace replication controllers — you should use them in most applications. It’s possible to roll out and roll back deployments and it is one of the most powerful and widely used Kubernetes objects. Deployments manage pods employing a
When to use deployments
- If you are running a stateless application (Applications that don’t need to persist data/state in a disk. Applications that connect with a backend, such as a database, and do not persist state in a disk also fall under this category).
- When you require scaling and self-healing of your pods.
- Pretty much everything that is not stateful.
This is an example of a deployment manifest:
If you look at the manifest, it’s the same as a
ReplicaSet with the only difference being the type of object.
There are two ways to do this:
- Modify the replicas section of the manifest file and run
kubectl apply -f <manifest file>.
kubectl scale deployment nginx-deployment --replicas=10.
I recommend modifying the manifest file in production scenarios and storing the manifest file in a source code repository to ensure consistency of the environment with code.
How good would it be if Kubernetes could autoscale the deployment if the utilisation of the Pod exceeds a limit? A Kubernetes object called Horizontal Pod Autoscaler is the answer to this. The object checks the pod metrics and if it breaches a defined threshold, it spins up another pod to absorb the load. You need to specify a minimum and a maximum number of pods, metrics to check, and you’re good to go.
Horizontal Pod Autoscaler
A Horizontal Pod Autoscaler ensures that your Kubernetes deployment can scale based on some pre-defined metrics horizontally and Kubernetes can manage the creation and destruction of pods based on the metrics. There are two ways you can achieve this.
- Using the
- Using the
Before using any method, we need to modify our deployment manifest to apply a resource limit to the pods so that the horizontal pod autoscaler is aware when it needs to scale the deployment:
The above configuration ensures that the maximum CPU limit allocated to the container is 500 millicores of the worker node the container is running on.
Using the kubectl command-line functions
kubectl autoscale deployment nginx --cpu-percent=50 --min=1 --max=10
The above command ensures that the
nginx deployment runs a minimum of one pod and a maximum of ten pods based on the CPU percentage metric not exceeding 50% of the allocated cores in the pod manifest.
As soon as the
nginx container starts using more than 0.25 cores in the worker node, this triggers the Horizontal Pod Autoscaler to spin up a duplicate pod, provided we don’t exceed the maximum limit of ten pods. Conversely, if the load reduces bringing down the utilisation below 0.25 core per pod, pods would be deleted until they reach the required usage or a minimum of one pod is running.
Using the HorizontalPodAutoscaler manifest on autoscaling/v2beta2 apiVersion
kubectl command-line can only help autoscaling on metrics such as CPU and memory utilisation. If you need to scale on more advanced and custom metrics, such as network utilisation, you need to create a
HorizontalPodAutoscaler manifest. Here’s an example:
The above specification describes three kinds of metrics:
- Resource Metrics. Metrics such as CPU and memory
- Pod Metrics. Metrics such as packets-per-second and other forms of network traffic within the Pod
- Object Metrics. Metrics such as requests-per-second and different types of network traffic in other Kubernetes objects such as Ingresses.
You have an option to use value and the average value in all three options. The average value brings the average from all pods running within the deployment, while value just looks at individual pods.
Managing Stateful Applications
Up to now, we have been discussing stateless applications, but a good percentage of the applications running are stateful.
Stateful applications are those that need to be aware of their existing state to survive a reboot. If the state information is lost, the application does not function properly. An application may require application data or metadata to persist on a disk. A typical example includes databases.
Though there’s a debate within the industry over whether to run stateful applications in containers or not, if you’re running a Kubernetes cluster as a standard, it would make sense to use a container-based approach wherever possible to avoid a heterogeneous environment. You can take advantage of the High Availability, network segmentation and other useful features that Kubernetes provides if you go that path.
Kubernetes, in its current form, offers many ways to manage stateful application using
A stateful set is similar to a deployment with a notable difference that it ensures that pods and volumes are unique and ordered. If you spin up two replicas of a
StatefulSet using dynamic volume provisioning, Kubernetes will ensure that the storage that was mounted as a volume to the Pod gets attached to the same volume again if it gets destroyed. Two pods in a stateful set are not interchangeable.
StatefulSet manifest includes a Kubernetes Service, as it needs to be sure what service it is running on to maintain the uniqueness and stable network identity. That is the reason why
serviceName features in the
StatefulSet spec as well.
A stateful set also requires a
PersistentVolume either provisioned manually or through an automatic provisioner through a storage class. That is the reason why
volumeClaimTemplates feature within a
StatefulSet manifest. We will discuss PVC and PV later.
Here’s an example of a
Persistent Volumes are storage resources, such as disks, which are either provisioned manually by an admin or dynamically using Storage Classes. We can mount these disks to containers running on Pods to ensure that the stateful application can persist data.
Static provisioning requires an administrator to create volumes either in a cloud provider or on-premise manually, and supply the information within a persistent volume manifest for Kubernetes to declare the volume as a usable Kubernetes resource.
Below is an example of a
PersistentVolume manifest with static provisioning which uses an existing Azure Disk. We then include the persistent volume on the volume claim template as below:
Static provisioning might seem a good option, but the issue here is that this object ties to a particular cloud provider and it’s not portable. If we need to shift our workloads to another cloud provider, part of the issue is going to be to change all our manifests to reflect the newly created disks in the new provider. Modifying a few manifests might seem an easy task, but large organisations running thousands of containers would experience vendor lock-in.
To ensure a more dynamic and a platform-independent approach, we would create a storage class object which would define the kind of object which needs provisioning (Disks, SSD, NFS, Block Storage, etc.) and the cloud provider which is going to provision it. The storage class name needs to be platform-independent — standard, fast, block, etc. Persistent Volume Claims would then use the storage class to tell the cloud provider to either provide existing storage or provision with a new one based on the storage class. Kubernetes then mounts the PVC as a volume to the Pod.
A storage class object defines the kind of object which needs provisioning (Disks, SSD, NFS, Block Storage, etc.), the cloud provider (Azure, GCP, etc.) which is going to provision it, the volume reclaim policy, and so on. A storage class name needs to be platform-independent such as standard, fast, disk, or SSD.
Here’s an example of a fast storage class which provisions an azure premium locally redundant managed azure disk:
Persistent Volume Claim
A persistent volume claim uses the storage class to dynamically provision disks as requested. The following uses the fast storage class to provision an azure disk of 5GB. The pod template can then use the
PersistentVolumeClaim to mount the disk to the container:
Putting it all Together with Dynamic Provisioning on a StatefulSet
Here’s an example of using dynamic provisioning to define the stateful set using the fast storage class:
If you look carefully you’ll see that nowhere within the specification does it mention any cloud provider. This way, the provisioning is portable and is one of the most recommended methods of provisioning volumes.
Static provisioning exists for organisations who want to keep a line between Dev and Ops — for more DevOps friendly organisations, dynamic provisioning is the way to go.
Daemons are background processes that run in servers for specific admin related activities such as to collect logs, housekeeping, monitoring, etc. In the case of a Kubernetes cluster, there are can be particular use cases such as to monitor node health, ship logs, or spin up a cluster storage daemon. In situations like these,
DameonSets is the best Kubernetes Object to fulfil the requirement.
DaemonSet is a Kubernetes object which ensures that a defined pod in the set runs in every node of the Kubernetes cluster. The
DaemonSet spins up a new pod when we add a new node to the cluster, and when we remove a node, it removes the Pod from the node.
Following is an example manifest file for the
DaemonSet which runs a
fluentd logging app within all nodes of the Kubernetes cluster.
Exposing Kubernetes Applications
Up to now, we’ve discussed the ways of deploying and managing containers. I would now like to discuss the multiple ways of exposing the Kubernetes objects to internal or external clients through Kubernetes Services and Ingress resources.
Pods are ephemeral objects. Every Pod has its IP Address and when one pod is unhealthy or dies, Kubernetes replaces it with a fresh pod with a different IP address. This leads to a problem as we cannot now use the same IP address as the old one to access the Pod.
Kubernetes provides the service object to solve this problem and abstract it by delivering its in-built service discovery mechanism. When you’re running your containers using Kubernetes, you don’t need to rely on external service discovery applications. Kubernetes has its CoreDNS which keeps a register of the IP addresses of the Pods which are running and provides internal Load Balancing and DNS resolutions based on service names. Based on the type of Service, you can expose your applications in three ways:
Cluster IP Services
Cluster IP Service is the default service type. We use it for exposing pods as a backend for Kubernetes frontend pods.
A typical example includes databases running as containers within Kubernetes. We need not expose these pods outside the Kubernetes cluster and a cluster IP service is discoverable only within the Kubernetes cluster. Any pod running within the cluster can call another service by using a
serviceName:port combination which ensures that, irrespective of what IP your pods, are running on, the request would be routed to the correct pod through dynamic service discovery.
Here’s an example of a Cluster IP Service:
A node port service exposes your pods to your nodes on a static port called the Node Port, and it is discoverable from any of your Kubernetes nodes. If your nodes are reachable from the Internet, external clients would be able to access your Pod on any
<NodeIP>:<NodePort>. Node Port ranges from 30000–32767, which is a non-standard range and is typically not suitable for production deployments.
The following is an example of a NodePort service exposing the
nginx deployment on NodePort 31000. If you do not specify the node port, it picks up any free port in the range.
A load balancer service, in essence, creates a
NodePort and requests the cloud provider to dynamically provision a load balancer in front of all your Nodes.
To explain this, say you have one master and three worker nodes named
node03, and you have an
NGINX pod running on port 80 which you want to expose externally. A
LoadBalancer service would first create a
NodePort on any random port from the
NodePort range (say for instance 31000), and then create a
Load Balancer running on port 80 as the front end, and
node03:31000 as the backend. A
Load balancer can be internal or external and can have multiple use cases. A
Load Balancer service is one of the methods to expose your frontend pods out to the Internet or within your internal infrastructure in a production environment.
The following example exposes the
NGINX pods on a
There are other services such as
ExternalIP, but they’re rarely used and not in the scope of this discussion, if you are interested, check out the Online Kubernetes documentation.
Load balancers are expensive resources and it would be wasteful to provision a new load balancer for every application exposed externally. When multiple applications are running within a cluster, it makes sense not to spin too many external load balancers.
Instead rely on a reverse proxy to map traffic coming externally from one load balancer to multiple resources based on the URI path, fully qualified domain name or an HTTP header. Ingress resources are objects meant for this purpose. Ingress resources are typically used only for managing HTTP and HTTPS traffic — for anything else,
LoadBalancers are more suitable.
A typical ingress setup requires an ingress controller. An ingress controller is a container that is configured by
Ingress manifests to route traffic coming to it to the correct ClusterIP Service. Here’s a typical simple chain running an ingress:
To set up the ingress controller in your Kubernetes cluster, read this guide and refer other online documentation if required.
Here’s an example ingress manifest which uses URI Based Routing. Anything which hits the ingress controller on any FQDN but has the
/testpath URI would route to the test service on port 80:
An example of an ingress using host-based routing is below. All traffic using the fully qualified domain name
test.example.com having any URI would be routed to service test on port 80:
The following shows a fan-out using host-based and path-based routing used together. Any request coming on foo.bar.com with URI /foo would route to service1 on port 4200, and with URL /bar on service2 on port 8080.
Various permutation combination can result from this and you can create powerful and dynamic routing based on
Ingress rules. An
Ingress resource is an absolute must if you are managing HTTP and HTTPS-based applications.
Thank you for reading. I hope you enjoyed the story. If you’re interested in learning more, check out the following articles as they might be of interest to you: