Recently, I started my Kubernetes journey and wanted to understand its internals better. I did a talk on these lines and here is the blog version of it
Before we try to understand Kubernetes, Let us spend little time on clarifying what a container is, and why they are so popular. After all, there is no point in talking about containers orchestrator (Kubernetes) without knowing what a container is :)
A “container” is… a container to hold all the stuff you put in. Duh!
Stuff like your application code, dependent libraries, and its dependencies all the way up to the kernel. The key concept here is isolation. Isolate all your stuff from the rest so that you have better control of them. There are three types of isolation provided by containers
- Workspace isolation (Process, Network)
- Resource isolation (CPU, Memory)
- File system isolation (Union File System)
Think of containers like VMs on diet. They are lean, fast (to startup) and small. And, all this was not built ground up. Instead, they used the constructs (like cgroups, namespaces) present in linux system to build a nice abstraction over it
Now we know what containers are, It is easy to understand why they are very popular. Instead of just shipping only your application binary / code, It is possible to ship the whole environment needed to run your application in a practical way as containers can be built as very small units. A perfect fix for “It works in my machine” issue
When to use Kubernetes?
All is well with containers and software developers life is much better now. Then why do we need another piece of technology, a container orchestrator like Kubernetes?
You need it when you get to this state, where there are too many containers to manage
Q: Where is my front end container, how many of them am I running?
A: Hard to tell. Use a container orchestrator
Q: How will I make my front end containers to talk to newly created backend containers?
A: Hardcode the IPs. Or, Use a container orchestrator
Q: How will I do rolling upgrades?
A: Manually hand holding in each step. Or, Use a container orchestrator
Why I prefer Kubernetes
There are multiple orchestrators like docker swarm, Mesos and Kubernetes. My choice is Kubernetes (and hence this article) because Kubernetes is …
… like lego blocks. It not only has the components needed to run a container orchestrator at scale, but also has the flexibility to swap different components in and out with custom ones. Want to have a custom scheduler, sure just plug it in. Need to have a new resource type, write a CRD. Also, the community is very active and evolving the tool rapidly
Every Kubernetes cluster has two types of nodes (machines). Master and a Worker. As the name suggests, Master is to control and monitor the cluster where as the worker runs the payload (applications)
A cluster could work with a single master node. But better to have three of them for high availability (Known as HA Clusters)
Let us take a closer look at the master and what it is composed of
etcd : Database to store all the data about kubernetes objects, their current state, access information and other cluster config information
API Server : RESTful API server that exposes end points to operate the cluster. Almost all of the components in master and worker nodes communicates to this server to perform their duties
Scheduler : Responsible to decide which payload needs to run in which machine
Control Manager : It is a control loop that watches the state of the cluster (gets this data by making calls to API server) and takes actions to bring it to the expected state
kubelet : Is the heart of the worker node. It communicates with the master node API server and runs the containers scheduled for its node
kube Proxy : Takes care of networking needs of pods using IP tables / IPVS
Pod : The work horse of kubernetes which runs all your containers. You cannot run a containers inside kubernetes without a pod abstraction over it. A pod adds functionalities that is crucial to kuberenetes way of networking between containers
A pod could have more than one container and all the servers running inside these containers can see each other as localhosts. This makes it very convenient to separate different aspects of your app as separate containers and load them all together as one pod. There are different pod patterns like sidecar, proxy and ambassador to address different needs. Check this article to learn more about them
Pod networking interface provides a mechanism to network it with other pods in the same nodes and other worker nodes
Also, each pod will be assigned its own IP address which is used by kube-proxy to route traffic. And this IP address is visible only within the cluster
A volume mounted inside a pod is also visible to all the containers and sometimes these volumes can be used to communicate asynchronously between the pods. For example, say your app is a photo uploading app (like instagram may be), it could save these file in a volume and another container in the same pod can watch for new files in this volume and start processing it to create multiple sizes and upload them to cloud storage
In kubernetes, there are lot of controllers like ReplicaSet, Replication Controllers, Deployments, StatefulSets and Service. These are objects that control pods in one way or another. Let us look at some of the important ones
The main responsibility of this controller is to create replicas of the given pod. If a pod dies for some reason, this controller will be notified and it immediately jumps into action to create a new pod
Deployment is a higher order object which uses a ReplicaSet to manage replicas. It provides rolling upgrades by scaling up a new ReplicaSet and scaling down (eventually removing) an existing ReplicaSet
Service is a controller object whose prime responsibility is to work as a load balancer in distributing the “packets” to the corresponding nodes. It is basically a controller construct to group similar pods (usually identified by pod labels) across worker nodes.
Say if your “front-end” app wants to communicate to “back-end” app, there could be many running instances of each. Instead of worrying about hard coding the IPs of every back-end pod, you send the data packets to the back-end service which then decides how to load balance and forwards accordingly
PS: Note that service is more like a virtual entity as all the packet routing is handled by IP tables /IPVS /CNI plugin. It just makes it easier to think of it as a real entity sitting out their to understand its role in the kubernetes ecosystem
Ingress controller is a single point of contact to the outside world to talk to all the services that are running inside the cluster. This makes it it easy for us to set security policies, monitoring and even logging at a single place
P.S: There are lot of other controller objects in Kubernetes like DaemonSets, StatefulSets and Jobs. There are also objects like Secrets, ConfigMaps that are used to store application secrets and configurations. I might be covering them in my future blog posts. I have also created a similar post on service mesh which I hope you will find it as useful