A Brief History of Google’s Kubernetes and Why It’s Fantastic

The beginner’s guide for understand what Kubernetes is and why it’s essential for reliable, scalable modern applications

Alan Wang

Published in

FST Network

10 min readApr 21, 2022

Key Takeaways

Google’s Kubernetes is a open-source project for reliable, scalable container operation and has been applied in large scales.
Containerization is the answer to deploy and scale apps (especially microservices) more easily.

If you are in IT industry, you probably heard the name “Kubernetes” (which means “pilot” or “helmsman” in Greek) a lot these days. What it does is often referred as container orchestration, although people sometimes point out that it is not an accurate term either.

So what is it really?

This is not original but still…

Kubernetes is indeed very popular. According to CNCF (Cloud Native Computing Foundation)’s State of Cloud Native Development Report for Q1 2021, there are 6.8 million cloud native developers worldwide (5.6 million of them use Kubernetes, 67% growth from the previous year) with 10.2 million containers running. (Don’t worry: we will explain what is a container later.)

CNCF estimated there are 26.8 million developers globally, so about 1 in 5 developers in the world is currently using Kubernetes.

To understand what exactly Kubernetes does and why it’s so popular for modern system operations, we’ll have to take a quick look of the history.

At the Dawn of Time, There were Monolithic Applications

I briefly worked as a Java engineer at the end of 2000s. The apps we have were all monolithic: they use either JavaServer Faces or Struts to render web pages and use Spring and Hibernate to query databases. Everything has to be compiled into a single giant .war file to run on a big Oracle server.

In one of the bigger project, we had a team about 30 people divided into system analysts, web devs and service devs. Everything was well-planned in the beginning. Then the client decided to add a new function branch in the software. We literally had to drill holes into our architecture because we didn’t have time to fix it, and doing so caused even more bugs and chaos.

As the name implies, a monolithic app is like a (gigantic) block of stone with everything tightly coupled in one code base. This is a logical choice if you have a small app. But as project scale goes up, everything starts to lose control.

A concept called Service-oriented Architecture (SOA) tried to solve this with modular programming: the back-end of the app are separated from the front-end and splitted into individual web APIs. This at least makes project developing easier.

The Age of Microservices

The trouble in maintaining a giant monolithic app is only half of the problem: they are also difficult to scale, or increase service volume to handle more user and data. Sure, you can deploy more servers and apply a load balancer/reverse proxy in front of them all, but any changes will take down the whole service for a while.

In a recent article, Mario Izquierdo explained that how Twitch switch from a Ruby on Rails monolithic app to Golang-based microservice architecture in early 2010s to solve performance bottlenecks. Unlike SOA services are still part of the same back-end, microservices are independent mini apps themselves, usually paired with their own databases.

Well, you can still have a single shared database outside and use a technique called sharding to scale it up. But that’s not what we are doing to discuss today.

One of the most common API for microservices, REST (Representational State Transfer), is also a lot easier to implement and more scalable than, ironically, SOA’s SOAP (Simple Object Access Protocol).

Now scaling up is easy— you simply increase the number of microservices on demand. You can also run multiple instances of microservices on one server because they require less resources.

World of Virtual Machines

The next problem is that it’s difficult to run different apps on a same server. Because you may not be able to afford one type of app per server.

For example: you have one microservice written with Spring Boot (Java) and another with .NET Core. You have a front-end app built with React (which probably runs on a server in Node.js). You may even have a machine learning service that runs Flask (Python). It would be aggravating try to make them work together, especially on different types of servers.

Virtual machines (VM) — machine-level virtualization — use an emulator called hypervisor to “simulate” multiple machines on the same physical machine. Every VM has its own operating system, memory and CPU resources. Anything runs in a VM will be completely separated to apps in another VM. There are numerous cloud platforms providing VM hosting at scale.

Containerization: One Size Fits All

VMs are great, but they also requires a lot of resources and are slow to boot up. A new concept containerization, popularized by Docker in 2013, is a solution for this.

Docker containers are, in effect, lightweight Linux virtual machines. The difference is that containers share the host OS kernel and memory via container runtime — only apps and files are separated (OS-level virtualization). Containers starts quicker and need far less resources so you can run more on the same machine.

In a test done by IBM in 2021, a virtual environment with four servers (16 x 2.1 GHz cores and 128 GB memory) can run either 8 VMs or 33 containers. What a huge difference! They also tried to compare the annual operation cost of 32 VMs vs. 33 containers. The latter only cost 25% of the former.

Twitch’s Mario Izquierdo also mentioned that how they gradually move from Amazon EC2 (host VMs) to AWS (containers) in their following article. The containerized microservices — called “Tiny Bubbles” — are each managed by different teams.

Container engines make it possible for application-oriented infrastructure: every container is a bubble that contains one app. Managing containers is essentially the same as managing apps. Since containers can be cloned and are portable using images, you can deploy and scale any application at will and practically anywhere.

Container Orchestration, or the Hive Mind?

Containers are much easier to manage than VMs, but of course, manually managing anything at large scale won’t be easy. What happens if you have to run 100 instances of microservices? Do you have to create microservices 100 times? How do you know if any of them go down? What about tens of thousands of containers?

Kubernetes — an open source project — is designed exactly for this purpose, and it came from 15 years of experience from Google’s internal management system, Borg (2003) and Omega (2013):

Though widespread interest in software containers is a relatively recent phenomenon, at Google we three container management systems over a decade have been managing Linux containers at scale for more than ten years and built three different container management systems in that time.
…The first unified container-management system developed at Google was the system we internally call Borg. It was built to manage both long-running services and batch jobs…Omega, an offspring of Borg, was driven by a desire to improve the software engineering of the Borg ecosystem.
…The third container-management system developed at Google was Kubernetes. It was conceived of and developed in a world where external developers were becoming interested in Linux containers, and Google had developed a growing business selling public-cloud infrastructure.

— Borg, Omega, and Kubernetes (2016)

Borg — named after a fictional alien race in Star Trek — has become so powerful that it already operates in a mind-boggling scale in Google:

Google’s Borg system…runs hundreds of thousands of jobs, from many thousands of different applications, across a number of clusters each with up to tens of thousands of machines.

— Large-scale cluster management at Google with Borg (2015)

Kubernetes 1.0 was released in 2016. It has since widely adopted by numerous cloud providers including Amazon, Microsoft, Google, IBM, Oracle and Red Hat and is maintained by CNCF and the developer community.

We’ve mentioned in the beginning that what Kubernetes does is described as a container orchestration: it automates container operations similar to a conductor leading a group of musicians. But as the name Blog implied, hive mind may be a closer description — like a Borg Queen managing a fleet of unified drones.

What Kubernetes can do is listed as follows in the official documentation:

Service discovery and load balancing
Storage orchestration (add any local or cloud servers)
Automated rollouts and rollbacks
Automatic bin packing (CPU/memory resource allocation)
Self-healing (start up new containers when needed)
Secret (security-related information) and configuration management

Kubernetes is a declarative tool: you don’t have to run containers yourself. Instead, you “declare” a deployment manifest (like a shipping order) to specify what container and how many of them are expected to run in a Kubernetes environment.

Kubernetes monitor the containers and if any of them crashed, new containers will be created to match your manifest. CPU/memory resources are automatically allocated. Containers are bind together as a single API endpoint. Kubernetes also offers its own load balancer, so you can scale up your service by simply changing the manifest.

So: Kubernetes enables reliable, scalable and automatic container operations with little human intervention. It can also be deployed across multiple physical or virtual machines, which can be cloud providers or local servers anywhere. It’s the key for modern information system management at large scale.

Kubernetes also brings great benefits for developers. Apps (not just microservices, but also anything including front-end apps) deployed on Kubernetes — which is “cloud-native” for their container nature — are easier to swap or upgrade by simply updating manifests. Kubernetes can perform rolling update without stopping the service.

In State of Cloud Native Development Report, 33% developers report that they can release production code daily and 31% weekly. Dev team can now stay competitive since they can roll out new features faster than other companies.

A Simplified Overview of a Kubernetes Cluster

We won’t go too technical in this article, but let’s take a look at what is inside a Kubernetes cluster, which is a basic unit of deployed Kubernetes environment (your apps would be deployed in it). Depending on the needs, you may have more than one clusters to work together. A cluster basically has the following parts:

Control plane — the brain of a Kubernetes cluster
Worker nodes— physical of virtual machines in the data plane of a cluster
Pods— each pod is consisted of one or multiple containers in a node which shares the same storage and network

Originally Kubernetes use Docker container engine by default. Now it supports anything that implements the Container Runtime Interface (CRI), like CRI-O and containerd.

For now, you don’t need to worry too much about the smaller components in either control plane or nodes — they are the lower-level detail of how Kubernetes govern nodes and pods.

For example, kubelet is the agent that runs a node, which is like a group leader who reports to the control plane (the control plane is in fact a collection of master nodes). kube-proxy is the network proxy for each node. etcd is the persistent database for the control plane itself, the node controller and pod scheduler are also in the control plane, and so on.

Kubernetes also allows you to add custom resources (CR) into a cluster, which works just like a pod or container but with more flexibility. That will be a topic for another day, but also an import one for FST Network’s Logic Operation Centre (LOC).

Use Case: the (Un)expected Success of Pokémon GO

Pokémon GO is one of the earliest and largest user of GKE (Google Kubernete Engine) platform. It runs containerized front-end app and various microservices in a single Kubernetes cluster. The game company, Niantic, only had a small team. The worst estimation user traffic at launch is 5 times of the original estimation.

However on July 6th, 2016, within 15 minutes after the game launched in Australia and New Zealand, the number surged to 50 times of their goal.

Source: Bringing Pokémon GO to life on Google Cloud

Niantic called Google for help. Google rapidly added thousands of nodes into the cluster, which has become literally planetary-scale. The game went through the much-anticipated US launch next day and Japan launch later that month (which is 3 times of the US launch) without incident, with tens even hundreds of thousands of players online with daily 5~10 TB data.

Even though Niantic did not expect this level of popularity, thanks to the game’s containerized design and the decision to deploy it on the cloud, it can be scale up in a very short period of time. In some ways, you can say that they sort of expected the unexpected.

Today, it’s not so uncommon for enterprises to have clusters with thousands of worker nodes. It can even go massive — the precision agriculture company, Bayer Crop Science, runs a cluster with 15,000 nodes (scaled up from a 4,500-node cluster) across 240,000 CPU cores and 1.48 PiB of RAM, to calculate promising seed genotypes. OpenAI has a 7,500 node-cluster to train very complex image and text AI models. It’s not necessarily the more nodes the better, but it’s clearly achievable.

Kubernetes brings an extraordinary level of reliability and scalability to modern systems, and thus has become a synonym for success.