Pragmatic K8s Understanding
This article simplifies k8s as a magical tool to simplify deployment. By the end of the article you should have a good high level understanding of kubernetes concepts.
This article is also part of a Machine Learning Ops (MLOps) series.
Before we start, the image below describes the experience of learning k8s right now in spite of the resources widely available in the web, which prompted me to write an article that follows the KISS principle.
Why kubernetes (k8s)?
Remember when during development you only have one machine to deploy with, and you’re asked to scale your application without much budget? Naturally, you set up a load balancer and create two instances with different ports to handle the extra load. Later on, you’re given more computers, and you do the same thing on those computers as well. This becomes a tedious task as more and more computers are required, and it’s still not an easy task even with tools like Ansible.
What k8s does is automating away all of this, so most of the times, after you set up k8s, all you need to do is register your new computer with labels of apps that are allowed on it, and k8s takes care of the rest. On a single computer, it handles load balancing and ports automatically for your applications when scaling up, all without bundles of bash/Ansible scripts, which brings us to the next great feature of k8s.
Writing k8s configuration files are pretty much like writing SQL code whereas tools like Ansible are similar to writing Python code. You just describe what you want, and k8s finds the best way to do it. Since the configuration files are just YAML files, you can keep it with your repository as well. Updating the deployment is as easy as updating the configuration files and applying it to the cluster.
This means you can bring your application to different clusters in different environment and expect more or less similar behaviour, which makes it very attractive for a lot of companies.
However, k8s requires containers to do its magic, so before I proceed to provide a simple guide to k8s, I have to explain containers.
Why containers?
Nuff’ said.
The Actual Guide
K8s terms translator
- Node: Computer that is not a container.
- Cluster: Group of nodes.
- Control plane: Node that controls other nodes.
- Persistent Volumes: A description of what storage is available.
- Persistent Volume Claims: A description of what storage you need.
- Pods: A description for how containers are run and how they use persistent volumes.
- Deployment: A group of pods.
- Service: A description for how deployments communicate through the internet.
Using k8s
- Minikube has, by far, the easiest way to start with k8s on a single computer.
- Otherwise, get one from Azure, AWS or GKE.
- Finally it would be setting up a cluster manually using kubeadm or kubespray.
K8s essentials for developers
There’s really four basic concepts you need to know before using k8s: containers, persistent storage, deployments, services.
We’ll use Nginx for this example, and I will only show the YAML files, since to use it you just save the YAML file and use kubectl apply -f .
Containers (Docker)
You’ll have to be able to create a docker image. I wrote an article on how to reduce docker container size that would be useful as well, which also contains a sample docker file used to build the image.
Do note that containers can be restarted or destroyed often, for example when your container crashes k8s will automatically restart it.
Persistent Storage
The easiest to start with local data, but there are many other types as well. This type of storage will not be destroyed when the container is destroyed.
apiVersion: v1
kind: PersistentVolume
metadata:
name: nginx-pv
labels:
type: local
spec:
storageClassName: manual
capacity:
storage: 10Gi
accessModes:
- ReadWriteMany
hostPath:
path: "/var/lib/path-in-node"
A persistent volume only describes what storage is available, it does not actually allocate the storage or block it off.
You can use access modes to describe the behaviour of pods when dealing with this storage (i.e. mounting). This is irrespective of the actual file system.
There are three modes:
- ReadWriteOnce : the volume can be mounted as read-write by a single node
- ReadOnlyMany : the volume can be mounted read-only by many nodes
- ReadWriteMany : the volume can be mounted as read-write by many nodes
Deployments
This describes an actual “application”. It uses containers and persistent storage.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx-deployment
labels:
app: nginx
spec:
replicas: 1
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
volumes:
- name: nginx-vol
persistentVolumeClaim:
claimName: nginx-pvc
containers:
- name: nginx
image: nginx:latest
volumeMounts:
- name: nginx-vol
mountPath: /var/lib/path-in-nginx-container
ports:
- containerPort: 80
As noted, you can start with multiple replicas of your application. The deployment will create multiple copies of your pods using the template.
Services
You can expose your app to the internet using services. You can choose a load balancer between multiple nodes or just use a port on a node.
apiVersion: v1
kind: Service
metadata:
name: nginx-service
spec:
selector:
app: nginx
ports:
- protocol: TCP
port: 80
nodePort: 30080
type: NodePort
The service automatically load balances between pods created by a deployment.
The service selects deployment based on the app label specified in the deployment (i.e. the selector).
Services, by default are only allowed to use ports 30000–32767. You can use another reverse proxy or ingress to do port forwarding, common options are Nginx and HAProxy.
Conclusion
And that’s basically most of what you need to actually be productive in k8s.