Kubernetes Fundamentals

Shivanshu Goyal
Aug 17, 2020 · 10 min read

We learned about Docker in this article how Docker helps us to have an easy and efficient deployment of an application. Dockers are scalable, but it requires a manual effort to achieve it. There are some problems we encounter if we don't use any docker container orchestrator.

  1. Containers could not communicate with each other.
  2. Traffic distribution becomes a big problem.
  3. Container management is overhead to manage the cluster manually.
  4. Auto-scaling is not possible.
Image for post
Image for post

In a production environment, we really need to think about these problems to have a robust, highly available, economical application. Here containers orchestrator comes to rescue us. There are many orchestrators available today where Kubernetes from Google is the most famous and used one. Kubernetes project is one of the top-ranked projects on Github.

Image for post
Image for post

What is Kubernetes?

Kubernetes is an open-source platform, very often called containers’ orchestrator. It eliminates many of the manual processes involved in deploying and scaling containerized applications. Each Kubernetes cluster has multiple components:

  1. Master
  2. Nodes
  3. Kubernetes objects (namespace, pods, containers, volumes, deployment, Service, etc)
Image for post
Image for post
Kubernetes cluster with its components

What is a namespace in the Kubernetes cluster?

It is a cluster inside a Kubernetes cluster. It provides logical segregation of the applications from different teams/for different purposes. We create a namespace for each team and restrict their access to the assigned namespace only so that none of the team can access other team’s namespace or interfere with other’s containers and resources.

Each Kubernetes cluster comes with 4 default namespaces:

  1. Kube-system: We don't modify and create anything in this namespace. It has processes like master process, kubectl process, system process, etc.
  2. Kube-public: It has public accessible data like configMap. Try this command to get cluster-info kubectl cluster-info
  3. Kube-node-lease: It is used to read the heartbeats of the nodes to determine the availability of the nodes in the cluster.
  4. Default: It is the namespace used by us to create resources. We can create our own namespace if multiple teams are utilizing the same cluster.
kubectl get namespace // to get the list of all available namespaces
kubectl create namespace MyNamespace1 // to create a namespace

We can also use a namespace using a configuration file which is a recommended way to create a namespace.

What is kubectl?

The kubectl command-line tool lets you control Kubernetes clusters.

kubectl [command] [TYPE] [NAME] [flags]examples:
kubectl get namepaces
kubectl get pods
kubectl describe pod-label
kubectl delete pod-label
  • command: Specifies the operation that you want to perform on one or more resources, for example create, get, describe, delete.
  • TYPE: Specifies the resource type. Resource types are case-insensitive and you can specify the singular, plural, or abbreviated forms.
  • NAME: Specifies the name of the resource. Names are case-sensitive. If the name is omitted, details for all resources are displayed, for example kubectl get pods.
  • flags: Specifies optional flags. For example, you can use the -s or --server flags to specify the address and port of the Kubernetes API server.

Kubernetes components:

  1. kube-apiserver: Command line interface talks with kube-apiserver to talk with Kubernetes cluster.
  2. etcd: It is a distributed reliable key-value store. Kubernetes cluster stores master/worker nodes related information in this store to manage the cluster. It is also known as a cluster brain.
  3. kube-scheduler: It distributes the work to all the worker nodes across the cluster based on node capacity, container requirement, affinity, etc.
  4. kube-controller-manager: It is responsible for identifying an issue in a node/container. If it encounters any failure, it replaces the failed node with the new one.
  5. cloud-controller-manager: It is used when we deploy our Kubernetes cluster on the cloud.
  6. kubelet: It is a service/process agent that runs on every node to make sure all containers are up and running. It is also known as the captain of the worker ship to whom master talks.
  7. kube-proxy: It maintains network rules on nodes. These network rules allow network communication to your Pods from network sessions inside or outside of your cluster.
  8. Container runtime(docker agent): It is used to run the containers. In our case, it is a docker container runtime to run docker containers.
Image for post
Image for post
The master node is the cluster orchestrator that manages the cluster. It scales and schedules app containers and rolls out the updates. Whenever we deploy a container on the Kubernetes cluster, we tell master about the docker image URL, it looks for the worker nodes to deploy the container based on the available resources.
Image for post
Image for post
It is the actual server in the Kubernetes cluster where pods are running

What is a POD?

It’s an abstraction on top of the container/s. Each pod has its own unique IP address. Each IP address is reachable from all other pods in the K8s cluster. Pods are the smallest deployable units of computing that you can create and manage in Kubernetes. Kubernetes runs containerized apps. However, we cannot run a container in a cluster directly. Containers have to run inside a pod only.

Pods are mortal. They are created, they live and they die, Kubernetes never restores them back. Instead, Kubernetes creates a new one in its place.

Why do we need this abstraction?

  1. It helps us with the container port mapping problem. It becomes difficult to keep the track of free ports in the big cluster to get the free port to run your container as containers run on the same port without pods conflict with each other.
  2. It helps to replace the container runtime easily. Suppose, today we are using docker, tomorrow we could use vagrant. With pods, K8s configurations would stay the same while replacing the container runtime.

Pod as scaling unit

We add/remove pod to scale up/down an application respectively. Containers are not added to the existing pod to scale up the application. As the pod resources will be divided into multiple containers, which does not help to scale up the application/container.

Image for post
Image for post

Now, we understood running multiple containers in a single pod is not a correct approach to scale up our containers. then, Why does Kubernetes allow multiple containers in a pod?

The primary purpose of a multi-container Pod is to support co-located, co-managed helper processes for a primary application. Having multiple containers in a single Pod makes it relatively straightforward for them to communicate with each other. They can do this using several different methods.

  1. In Kubernetes, you can use a shared Kubernetes Volume as a simple and efficient way to share data between containers in a Pod. In most cases, it is sufficient to use a directory on the host that is shared with all containers within a Pod.
  2. Containers in a Pod share the same IPC namespace, which means they can also communicate with each other using standard inter-process communications such as SystemV semaphores or POSIX shared memory.

How to create a pod using a configuration file?

# shivanshu$ vi create_pod.yaml
# shivanshu$ kubectl apply -f create.pod.yaml
# shivanshu$ kubectl get pods
# shivanshu$ kubectl describe pod new_pod
apiVersion: apps/v1
kind: Pod
name: new_pod
app: my_web_application
env: dev
name: my_web_container
image: web_container_image

Let’s understand the YAML configuration file in Kubernetes in depth.

Each configuration file has 3 parts:

  1. MetaData: To identify the component with its name and labels.
  2. Specification: Attributes to spec are specific to the Kind of the component. kind is the second field in each configuration to identify the kind of component. It can have values like Pod, ReplicaSet, Deployment, Service, etc.
  3. Status: It is automatically generated and added by Kubernetes. How does it work? There are 2 states: Actual or Desired. Desired is what we define in spec like replicas: 2 in the below example, if the actual number of replicas is 1, then Kubernetes will launch a new container to match the desired and actual states.
apiVersion: apps/v1
kind: Deployment
name: my_web_deployment
labels: ...
replicas: 2
selector: ...

Where does the K8s cluster get the status?

Remember the etcd (the brain of the cluster), It keeps the status for each K8s component.

Let’s understand the use of labels and selectors.

As our Kubernetes cluster grows, There is a need arises to label each object present in the cluster to keep things organized(to help us and Kubernetes identify the objects to act upon).

Labels are key-value pairs that we can attach to objects like pods. They are used to describe meaningful and relevant information about an object.

Selectors are the way to express how to select an object based on its label/s. We can specify if a label equals to given criteria or if it fits inside a set of criteria.

  1. Equality-based
  2. Set-based

What is deployment?

Deployment is responsible for creating and updating instances of your application. It monitors the containers and provides a self-healing mechanism in case of machine failure.

Image for post
Image for post

We can deploy our containers using Pod configuration files directly. However, using deployment is a recommended way to deploy pods. It brings a lot of advantages:

  • It provides a self-healing mechanism in case of machine failure. If the Node hosting an instance goes down or is deleted, the Deployment controller replaces the instance with an instance on another Node in the cluster.
  • We don't need to worry about managing pods. If any pod goes down, the deployment controller will create a new pod in its place immediately in order to keep the desired number of pods running.
  • Here, we have only one file, where Pod specification and the desired number of running Pods are defined. Pod specification is under spec.template key, whereas number or running Pods is under spec.replicas

Sample deployment file:

apiVersion: apps/v1
kind: Deployment
name: nginx-deployment
app: nginx
replicas: 3
app: nginx
app: nginx
- name: nginx
image: nginx:1.14.2
- containerPort: 80
  • A Deployment named nginx-deployment is created, indicated by the .metadata.name field.
  • The Deployment creates three replicated Pods, indicated by the .spec.replicas field.
  • The .spec.selector field defines how the Deployment finds which Pods to manage. In this case, you simply select a label that is defined in the Pod template (app: nginx).
  • The .spec.template.spec is the blueprint of the pod.

A ReplicaSet ensures that a specified number of pod replicas are running at any given time. However, a Deployment is a higher-level concept that manages ReplicaSets and provides declarative updates to Pods along with a lot of other useful features. Therefore, we recommend using Deployments instead of directly using ReplicaSets, unless you require custom update orchestration or don’t require updates at all.

What is a service?

We learned about how to deploy our containers, manage replicas of the containers, how does Kubernetes scale up and scale down the containers based on load. Now, our containers are up and running in the Kubernetes cluster. But, the question is Can we access these containers outside the container? How do we expose our containers to the outside world?

Service is another Kubernetes object while helps us to expose our containers to be accessed from outside. Its job is to listen to the port and forward the request to the target pod.

Image for post
Image for post
Service is required to expose the container to the outside world

Let’s look at the service in details:

Image for post
Image for post
Create a service using service-definition file
shivanshu$ vi service-definition.yml
shivanshu$ kubectl create -f service-definition.yml
shivanshu$ kubectl get services

So far, we saw service is mapped with a single pod. But, It’s not true in the production environment, Suppose we have multiple pods for the same container(all pods have the same label — in our case myapp) in a node to have high availability.

Do we need to modify our service to distribute the load among all pods?

The answer is No. There will be no change in service definition when we add/remove pods for an application. Even if any pod goes down, there will no impact on end-user as service is still available to serve the end-user on the IP:PORT.

Image for post
Image for post
Service definition remains the same even with 3 pods as it looks for the pod label in spec.selector configuration

It uses random algorithm to choose a pod to serve the request.
The range for nodePort: 30000 to 32767
IP inside the service box is clusterIP.

Even when we deploy pods on different nodes, Kubernetes automatically creates the service that spans across all the nodes in the cluster and we will be able to access the container using any node IP on port 30010.

To summarize, single pod on a single node, multiple pods on a single node, multiple pods on multiple nodes, adding/removing pods from the node/s — the same service will work without any modification.

Types of services

We set it using spec.type in the service-definition file.

  • ClusterIP: the default one. It allocates a cluster-internal IP address, and it makes our Pods reachable only from the inside the cluster.
  • NodePort: built on top of the ClusterIP. The service is now reachable not only from the inside of the cluster through the service’s internal cluster IP but also from the outside: curl <node-ip>:<node-port>. Each Node opens the same port (node-port), and redirects the traffic received on that port to the specified Service. This is the one which we used above.
  • LoadBalancer: built on top of the NodePort. The service is accessible outside the cluster: curl <service-EXTERNAL-IP>. Traffic is now coming via LoadBalancer, which then is redirected to the Nodes on a specific port (node-port).

There is one more important thing to know which is Ingress.

Ingress exposes HTTP and HTTPS routes from outside the cluster to services within the cluster. Traffic routing is controlled by rules defined on the Ingress resource.

An Ingress may be configured to give Services externally-reachable URLs, load balance traffic, terminate SSL / TLS, and offer name-based virtual hosting. An Ingress controller is responsible for fulfilling the Ingress, usually with a load balancer, though it may also configure your edge router or additional frontends to help handle the traffic.

I tried to cover the basic fundamentals of Kubernetes which can help us to start with it. Kubernetes' official documentation is really cool to read more about it.

Hope it helps! Thanks for reading!

The Startup

Medium's largest active publication, followed by +775K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store