There’s a reason why the kubernetes project is the current crown jewel of the cloud native community, with attendance at Kubecon 2017 in Austin nearly four times that of last year’s conference in Seattle and seemingly every major enterprise vendor perched behind a booth in the exhibit hall eager to help attendees take advantage of the platform. The reason is that the advantages are significant, especially in those areas that matter most to developers and system engineers: application reliability, observe-ability, control-ability and life-cycle management. If Docker built the engine of the container revolution then it was kubernetes that supplied the chassis and got it up to highway speed.
But driving at highway speed means keeping your hands on the wheel and obeying the rules of the road. Kubernetes has its own rules, and applications that adhere to best practices with respect to certain key touch points are much less likely to wipe out and take a few neighboring lanes of traffic with them. In this post I am going to briefly discuss five important design features that will affect how well your application behaves when running on kubernetes: configuration, logging, signal handling, health checks, and resource limits. The treatment of each topic will be necessarily high level, but I will provide links to more detailed information where it will be useful.
Configure from env vars
Configuration is data required by a process at startup that varies between deployments. Config should be loaded from environment variables with sane defaults. There are a few reasons to follow this advice. Reading env vars with defaults is easy and a basic feature of every general-purpose language. It’s also portable and there are standard ways to initialize a container’s environment whether you’re running on kubernetes, Docker Engine, Docker Compose, etc. On kubernetes you can interpolate env var values into templates using a tool like helm, or load them from ConfigMaps or Secrets previously installed into the cluster. Lastly environment variables are observable, in the dashboard or from the command line, and
kubectl describe pods mypod is less work to type than
kubectl exec mypod -- cat /etc/myapp/config.py.
The kubernetes documentation has a good section on setting environment variables for a container. For background see the config topic in the Twelve-Factor App manifesto, and don’t miss Kelsey Hightower’s excellent hands-on explanation.
Log to stdout and stderr
Every server app produces a stream of log events which describe what it is doing, and sometimes what went wrong. A well-behaved kubernetes app writes these events to stdout and stderr, and does not concern itself with routing or managing log stream delivery. Logs written to the standard streams are observable on the console locally during development, and through the dashboard or command line when the container is running in the cluster. On Google Kubernetes Engine you can easily configure fluentd to capture log streams and deliver them to stackdriver for long-term storage, searching and analysis. A well-behaved kubernetes app should specifically not write to local log files. Doing so creates state in the container filesystem that has to be managed, either by mounting the log directory to a host volume so the data is persistent, or by running a shipper like rsyslog inside the container.
Pods in kubernetes get restarted for any number of reasons, including shuffling resources around or just running
kubectl delete pods mypod. When the system wants to kill a pod it first sends SIGTERM and then waits a set number of seconds before sending SIGKILL. This period is known as the termination grace period and is a property of the podSpec that can be overridden from the kubectl command line. If your process doesn’t implement a handler for SIGTERM then it will be SIGKILLed. Processes that are killed are immediately removed from etcd and the API, without waiting for the process to actually terminate on the node. Bottom line: if you have anything you need to do to ensure graceful shutdown then you need to implement a handler for SIGTERM.
There are two other things that are very important to keep in mind: the first may seem obvious and it is that if you have more than one container in a pod they will all be signaled on shutdown and you need to have the right strategy for each depending on what it is doing. The second is less obvious, and actually is a bit of a trap: only the process running as PID 1 in a container is going to be signaled. If you use the non-exec form of CMD in your Dockerfile, for example, then your thing is running as
/bin/sh -c thing and it isn’t PID 1 and won’t get signaled. Instead use the exec form
CMD ["/usr/local/bin/thing"] or use the exec form of ENTRYPOINT. For additional info see this good overview of pod termination in the kubernetes documentation.
Implement readiness and liveness probes
Kubernetes works best when it knows the health status of every container in the system. The kubelet needs this information in order to restart containers that have failed, and to keep service endpoints up to date. To determine the health of a container kubernetes relies on two signals: liveness and readiness. Liveness measures whether a container is running or not, and readiness measures whether a running container is able to accept traffic. Specific tests for liveness and readiness can be configured for a container object. There are several options from executing external commands to making tcp and http requests.
If you don’t define liveness and readiness probes for your containers then the system will rely on the default signal, which is whether the container’s PID 1 still exists. If that process terminates then the kubelet finds out about it from the container runtime (i.e. Docker) and schedules a restart. For almost all long-running applications of any complexity this default behavior is insufficient. Your container’s PID 1 could be up but stuck in a loop, or blocked on some other broken bit and unable to do its job. Note that you should always define both a liveness and readiness probe, though it is possible these probes may use the same test for some apps. See the kubernetes documentation for a good explanation of creating liveness and readiness probes.
Set resource limits
A big part of kubernetes’ job is scheduling pods to run on nodes. A node in a cluster is an environment with constrained resources: memory, cpu, and ephemeral storage all represent things that have to be shared by containers running on the node (host network ports too, but we won’t talk about them here). For each of these resource types kubernetes defines the concept of a request and a limit. A request tells the system how much of a particular resource a container uses normally, and is an input to scheduling. A limit specifies the maximum amount of the resource a container is allowed to use, and may cause the hosting pod to be evicted from a node under constrained conditions.
The benefit of setting requests for every container is to allow kubernetes to do a better job of fitting pods onto nodes. The benefit of setting limits is as an additional health check: containers will not have the ability to go off the rails and consume all of memory, cpu or storage, thus forcing the scheduler to start moving things around to find available resources. If you don’t define requests and limits your containers are subject to the defaults, which are under the control of your cluster administrator. Defaults can be set per namespace, making it possible to divide the total resources in a cluster between different teams or different applications. The kubernetes documentation contains an overview of configuring requests and limits, and the same for managing defaults starts here.
Twelve-Factor all the things
I’ll close by mentioning the Twelve-Factor App manifesto, which I linked to twice above. This document represents a set of design guidelines and patterns for implementing applications that are deployable, scalable, observable, and work well within cloud-native environments. If you’re intending to write applications to run on kubernetes you can’t go far wrong following the recommendations outlined in it.