Profiling GKE startup time
Sometimes being a Developer Advocate is really awkward. You’re constantly being bombarded with information from a lot of different directions, that you really can’t pass along. The crux of the job is figuring out what information to pass along, and how to refine it so it’s properly consumed.
A perfect example of this, was the FoodiePics team, that I met in NYC a few weeks ago. While I was also in NYC, a completely separate company named “Bacon, Live!” had approached me as well. The difference between the companies, was that Bacon, Live! was more like Twitch, but for people cooking meals.
It’s one thing to get approached by two startups, both focused on food.. It’s another to hear that they both have cold-boot problems: We were able to help the FoodiePics team profile and improve their cold-start time on Google Compute engine, but the Bacon, Live! team was using Google Container Engine (GKE) which is a whole separate beast.
If you’ve messed with Kubernetes, then you’ll know that it’s an awesome tool to help you manage, contain, and deploy images to your various instances in a seamless and powerful manner. But you’ll also note that all that power comes with some overhead, and very thick scaffolding. Which is why helping the Bacon, Live! team find, and fix their cold-boot problem was going to be so difficult : It turns out that cold-boot timing is not something that Kubernetes surfaces to you easily.
Much like I did with Bait-and-Stich (GAE) and FoodiePics (GCE), the first step in solving Bacon, Live! ‘s problem, is figuring out how to profile GKE cold-boot time, and then figure out how fast it is w/o any specific code running on it.
Baseline Pod cold-start performance
It’s very rare that you get a platform posting performance information, so I was very happy when I saw that the Kubernetes site posted, about a year ago, that their pod startup latency trended towards ~3 seconds for a 1K node cluster :
This is saying that the 99th percentile is a little above 2.5 seconds for the v1.3 nodes, which gives us a good target to shoot for; Let’s see if we can validate those numbers.
After digging around, neither the Cloud console, Stackdriver, or Heapster (for Kubernetes) surfaced any direct numbers with respect to the boot time for pods. There was, however, a single place where some startup time information was surfaced kubectl get pod. It turns out that calling “kubectl get pod -o json” will return a json file with some time-stamped information relating to the pod’s startup phases:
Here, you can see that the pod status emits the “Scheduled” “Initialized” and “Ready” phases, each with a time-stamp that tells us (relatively) when it occurred. For the example above, you can see that it took about 1 second to go from “initialized” to ‘ready’ which is the point in which the containers of the pod start to initialize themselves. This information basically tells you how long it took to mount volumes, and pull down image data (but we’ll get to that in a second).
Our goal now is to profile this data over a bunch of startups, so we can get a clear picture of the startup time of pods for our very simple use case. Thankfully, kubectl gives us some control that allows us to find, query, and kill pods:
So, by killing nodes, letting them spin back up, and reading the state timestamps, we can get a clear sense of how long each pod took to get back up and running. Do that 100 times or so, and we end up with something that looks like this:
We can see that the time it takes the pod to move from PodScheduled to Initialized is non-existent, and since we’re using such a bare-bones container, the time it takes to move to “Ready” is between 0 and 3ms.
Containers != VMs
One of the fundamental tenets of Kubernetes is the decoupling of application containers from the details of the systems on which they run. Google Cloud Platform provides a homogenous set of raw resources via virtual machines (VMs) to Kubernetes, and in turn, Kubernetes (k8s) schedules containers to use those resources. Once those resources are provisioned, k8s can download, install, and boot the image.
At this stage, it can feel very similar to how GCE with containers/snapshots works, but there’s a subtle difference that is the backbone of the performance difference : Booting a pod does not require booting the VM.
Typically, when booting a GCE instance, the VM itself needs to actually be started up, and be in a healthy state before your own code can initialize itself and get things rolling. If you boot up a new one, then the whole process has to happen again.
For Kubernetes pods, however, that’s not the case: Booting a new pod only requires the initialization of its containers (which is orchestrated , rather than a re-booting of the kernel itself. (This is in great part due to the fact that containers share a kernel)
As such, when we’re talking about pod boot time, what we’re really concerned with is how long it takes the containers to initialize.
Timing the containers
A Kubernetes pod is made up of a series of containers, which provide endpoints and services for the instance itself. The pod is considered “ready” once all the containers have been initialized, and the health checks pass. To help facilitate proper boot up, the initialization process breaks up containers into two main groups:
The first are the Init Containers. Which execute in a serial order, as defined by the setup scripts.
The second are the parallel containers; a list of remaining containers, execute them in random parallel order.
This setup has some interesting quirks and chances to be problematic. For instance, if a single container during the init phase takes a long time, it’ll block the loading of everything else. I had a hunch that this might be the problem for Bacon, Live!, but to prove it, I had to figure out the right way to profile each container init stage.
Timing the init containers
Getting the boot time for a container turned out to be much more difficult than I expected. I didn’t want to go through and force Bacon, Live! to rebuild all their images for profiling (although they thought that was a fine idea, long term..), so something less intrusive was needed. After a bit of digging, and trial-and-error, I realized that there might be a “good enough” solution using kubernetes get pod command:
This will return the pod status, and when you have init containers defined, the STATUS field will list the status of the init containers. For example, a status of “Init:1/2” indicates that one of two Init Containers has completed successfully:
Which means that we can successively poll the get pod command about every 10 ms or so, and can get a sense of how long each stage is taking to boot up. Below is a graph showing off a small prototype I put together to highlight how this works:
In the example above, you can clearly see that the stats-gathering container and the DB initialization containers are the ones that are taking the most time to get off to the races on. Since these are initialization containers, that means that all following services are delayed about 4–12 seconds, since these stages block the initialization of other work groups.
Timing the parallel containers
Timing the parallel containers is much trickier, since each one can start & stop in whatever order they want, and have their own definition of “up and running”. I tried a bunch of different ways to get this data in a reliable, don’t-touch-too-much-stuff fashion, and settled on the following technique:
For each container, expose a new entry point called “/health”, which returns the string “Success”. Repeatedly polling the this endpoint from an external location yields three different states:
- error — The service is not up and running yet (red bar below)
- 400 / 500 — The service is booting, but the endpoint is not available yet (yellow bar below)
- “Success”
Goal is to add a /health to each container, and have a script poll each one in order to get boot times. We can create a chart on how long it took to start, and how long each stage was:
The chart above reveals a few things. Firstly, is there can be some small (read: insignificant) delay time in initializing containers. Secondly, in this example, we see that the container initialization phase took the most time for this example.
Cooking with profilers.
Phew! There’s lot of working nuance with Kubernetes pods and boot time, which makes a lot of sense : Once you reach a stage that k8s result in a significant ops-savings for your application, you can bet that there’s some baseline level of complexity that you have to be involved with.
At this point, we’re able to time the general pod boot time, image fetching time, and the container init times. Which is good enough to start taking a look at Bacon, Live!’s performance woes. But since this article is already long enough, I’ll push that topic to the next one ;)