Kubernetes Best Practices

I was chatting to a former Googler SRE who (correctly) pointed out that Kubernetes evolves very quickly (too quickly to maintain currency), uses (many) novel concepts and that there are (too) many ways to solve the same problem.

Much of this is true and not necessarily a bad thing nor necessarily different than any other technology. Where I disagree is that these factors have discouraged his adoption of Kubernetes. I would encourage you to dive in. Kubernetes is the success it is *despite* these (reasonable) concerns because it’s so very very good.

In this post, I’m going to give you some rudimentary best practices that I hope will help you grab this technology by its containers and dive in.

In no particular order:

  1. Let someone else do toil!

Use a Kubernetes service like Kubernetes Engine. Unless you are intellectually curious, a developer working on Kubernetes or you’re a platform provider that has customers asking for a Kubernetes services, save the hassle and use a Kubernetes service. Did you build your own home and your car? Or do you do like sleeping in a place the Wolf can’t blow down and drive a car that reliably takes you from A-to-B?

So, if you’ve read any of my other posts, I also recommend evaluating Regional Clusters and so you’re looking at something along the lines of:

gcloud beta container clusters create ${CLUSTER} ...
gcloud beta container clusters get-credentials ${CLUSTER} ...

And then, you’re ready to go with:

kubectl apply --filename=marvels.yaml

2. Try to think “Kubernetes”

This *may* be more of a challenge with Kubernetes Engine than in other platforms but, on Google Cloud Platform, you are forced to maintain an understanding of the state of your resources in Kubernetes (e.g. Nodes, Ingresses) and, at the same time, the underlying resources in Compute Engine (e.g. VMs, HTTP/S Load Balancers). This particle-wave duality problem is unfortunate. Thanks to Dale H. for first articulating this for me.

Where possible, try to stick to thinking in terms of Kubernetes’ resources and disregard the underlying GCE resources. Having now spent more than a year biasing my work to Kubernetes, it’s become easier to think purely in terms of “some number of” Pods exposed by Services (and Ingresses).

3. Namespaces, Namespaces, Namespaces

Update: Thanks to Michael Hausenblas for educating me on a best practice of *not* referencing namespaces from within Kubernetes YAML files. While you should always use namespaces, specifying these when you apply the file provides more flexibility and the ability to use the same YAML files against, e.g. different landscapes. See Michael’s article here.

Mike Altarace and I blogged moons ago about Namespaces in Kubernetes and where you should use them. Since then, I pretty much ignored my own advice thinking my use-cases were so minor that using Namespaces would be overwrought. I was wrong. Use Namespaces, always.

As containers are to processes, Namespaces are to Kubernetes projects. Quite apart from the security boundary that Namespaces convey, they’re an excellent way to partition your work and they yield an excellent way to reset or delete it:

kubectl delete namespace/$WORKING_PROJECT

The only downside is that, when using the non- default namespace, you will need to specify your working namespace --namespace=$WORKING_PROJECT on kubectl commands which, in my opinion, can be a good safeguard practice. However, you can always --all-namespaces or set a different namespace as your default (link).

4. Too many ways to solve a problem

This is a discouraging concern. I think it’s probably untrue but, absent good guidance and best practices, perhaps it appears that there are too many similar ways to solve the same problem. I’ve a common pattern that I use that I’ll summarize here to encourage discussion:

  • YAML files are knowledge in cold storage (see #5)
  • Your containers should do one thing well (see “distroless”)
  • Always deploy Deployments (see #6)
  • If you want L7 aka HTTP/S Load-Balancing use Ingress (see #7 — ha!)
  • Manage credentials securely using Secrets (link)

5. Bias to kubectl apply --filename over alternatives

It’s easy to get a quick-hit with kubectl create namespace/${WORKING_DIR} but, after several such commands, you may wonder how you reached the current state and — more importantly — how to recreate this state. I encourage you to create YAML files to describe your resources rather than the equivalent kub ectl create command.

I encourage you to become familiar with the *excellent* Kubernetes API documentation (link, link and 1.10) which are exhaustive, accurate and easy to navigate (perhaps the perfect documentation!?). But, even with this powerful tool, sometimes it’s a little challenging to take a kubectl command that works for you and convert it to YAML. It is not:

kubectl get deployment/${MY_DEPLOYMENT} --output=yaml
kubectl get service/${MY_SERVICE} --output=yaml
kubectl get anything/${MY_ANYTHING} --output=yaml

Pipe the results to a file if you’d like, but use these as a basis for equivalent (!) YAML file. You will need to drop any instance references.

Once you have crafted masterpiece.yaml, I encourage you to always apply to do the initial create, apply to do any subsequent updates and, if you must delete. That’s it!

kubectl apply --filename=masterpiece.yaml
kubectl delete --filename=masterpiece.yaml

Minor insight: you don’t need to pull YAML files local in order to deploy. You can provide kubectl apply --filename with URLs too and, as long as any dependent files are local references, the deployment will work.

Minor insight: the only place I see this used is in Kubernetes-land but it’s a legitimate practice, you may combine multiple YAMLs files into one YAML file with --- and ... file separators. So, instead of a namespace YAML, a deployment YAML and a service YAML, you may have a mega-YAML consolidating all three files into one.

This is valid YAML (copy-and-paste it into e.g. YAML Lint). It is *invalid* Kubernetes spec|YAML because each spec is incomplete but, if each were completed, this is perfectly fine YAML and a good way to keep resources associated.

See #B (Xsonnet) for one criticism.

6. Use Deployments

There’s a bunch of power in Deployments, suffice my guidance to say: use Deployments all the time, every time. Even when you’re deploying your first single podnginx. Deployments are “First Class” travel for the price of Coach, you can fall-into rolling deployments, you make a mistake, re-apply and Kubernetes takes care of killing the naughty pods and replacing them with well-behaved ones.

7. LoadBalancer and Ingress

These cause confusion. In my mind (and I may be wrong), when I create Services using --type=LoadBalancer, I want|get a Network LB. If I want HTTP/S (Level 7) Load-Balancing, I need to create an Ingress. Ingress is a confusing Kubernetes resource. Suffice to say, L7==Ingress (and lots of configuration power as a result).

Kubernetes Engine manifests Ingress resources as GCE HTTP/S Load-Balancers. Christopher Grant does a very good job demystifying Ingress in his posts (here and here).

8. NodePorts

I haven’t (ever?) directly created a ClusterIP. I’ve done things to Kubernetes that resulted in services being exposed by ClusterIPs. Mostly (!) I created NodePorts or I do things (e.g. create Ingress resources) that use NodePorts.

NodePorts are associated with Kubernetes Nodes and they’re Ports. The powerful facility they provide is that *every* node in the cluster (or is it this NodePool? [[TODO]]) expose the same service on the same (node) port.

If I create a service that’s exposed on NodePort X, I can be assured that, if I access that port on *any* node in the cluster, I will access the service. This forms the basis of Kubernetes’ load-balancing capabilities because the cluster is able to route incoming requests for the service to this port on any node.

Google Cloud SDK (aka gcloud) includes an ssh client that makes it trivial to connect to Compute Engine VMs (which, as you’ll recall are Kubernetes Cluster Nodes). ssh client include a port-forwarding capability. So, if we want to connect to a Kubernetes Service and we can look up the service’s NodePort, then we can trivially (!) port-forward to that service by port-forwarding (using gcloud or any ssh client) to the port on any Node.

The following example uses kubectl to grab the 0th node in the cluster. The Kubernetes Engine Node name is the same the Compute Engine VM name. Given a service called ${MY_SERVICE} in a namespace called ${MY_NAMESPACE}, we determine the service’s NodePort. Then, we switch to gcloud and use its built-in ssh to port-forward (using --ssh-flag="-L XXXX:localhost:XXXX).

NODE_HOST=$(\
kubectl get nodes \
--output=jsonpath="{.items[0].metadata.name}")
NODE_PORT=$(\
kubectl get services/${MY_SERVICE} \
--namespace=${MY_NAMESPACE} \
--output=jsonpath="{.spec.ports[0].nodePort}")
echo ${NODE_PORT}
gcloud compute ssh ${NODE_HOST} \
--ssh-flag="-L ${NODE_PORT}:localhost:${NODE_PORT}" \
--project=${YOUR_PROJECT}

What’s so powerful about this? Now, you may access the service as if it were local and without having to punch holes in a firewall.

NodePorts are high-numbered ports (~30,000–32,767).

9. Hacking kubectl use JSON

Google’s Cloud SDK (aka gcloud) is really excellent but kubectl (Kubernetes CLI) is more betterer (sic). One powerful feature is formatting and filtering output. This permits non-coding (non-API-wrangling) ways to extend scripts and other tools with information from Kubernetes clusters.

All Kubernetes resource state is accessible through e.g. kubectl get (in my experience more useful for this purpose than kubectl describe) commands. Then, all that’s left is to find the needle in what can be a haystack of JSON.

The trick is to:

kubectl get [resource]/[resource-name] --output=JSON

And then eyeball the results to start building a query string:

kubectl get [resource]/[resource-name] --output=jsonpath=".items[*]"

and iteratively refine the result set down until you have the item(s) that you seek. Here’s an example that should work with any cluster:

kubectl get nodes --output=json
kubectl get nodes --output=jsonpath="{.items[*]}
kubectl get nodes --output=jsonpath="{.items[0]}
kubectl get nodes --output=jsonpath="{.items[0].metadata.name}

Lastly, there’s a decent argument (and a tenet in *nix) for learning one JSON parsing tool and applying that tool to all JSON parsing needs. In this case, is there any reasonable competitor to jq? I suspect not.

Plus jq has an excellent playground (jqplay.org).

A. Use Labels

It’s been a long-time coming but all-manner of software services now support the concept of arbitrary labeling of resources (usually key-value pairs). The reason this is powerful is that this metadata provides an open-ended, entirely user-defined way to query resources. Kubernetes uses this principle inherently; it’s an intrinsic capability and not an after-thought|bolt-on.

A Kubernetes Service exposes an arbitrary number of Kubernetes Pods. A Services does *not* expose Pods called “Henry” or Pods that comprise a ReplicaSet. Instead a Service exposes Pods whose labels meet criteria defined during the Service’s specification and these labels, are of course, user-defined.

NB In the above example, we’re using a namespace called project-x and this namespace spec appears in the Namespace (when it’s created), the Deployment to define where the Deployment exists, and in the Service. The Deployment (which is called microservice-y) will create a ReplicaSet (implicitly specified here; it’s what Deployments create) that will maintain 100 Pods. Each Pod will have a label app: publicname-a and contain one container called grpc-proxy based on an image called image-grpc-proxy. The Service is called service-p. Crucially, the Service selects (!) Pods (in the project-x namespace only) that have a label app: publicname-a. The Service will select any Pod (in the project-x namespace) that has this label (key: value) pair *not* only those Pods created in this deployment. The Service does not reference Pods by their name (which is based on the Deployment name), their container name nor the container image name, only the labels associated with the pod.

NB This is *not* a good practice but it proves the point. If you were to run a configuration similar to the above and then separately create e.g. a Pod running Nginx (in the project-x namespace) and then you added the label app: publicname-a to it, it would quickly be incorporated in the set of Pods that are aggregated by the service-p Service. If you were to remove the label from any of the Pods aggregated by the Service, the Pod would stop being included.

This feature is exemplified by rolling updates where the Deployment creates a new ReplicaSet comprising new Pods for version X’, distinct from the ReplicaSet and Pods for version X. The Service may be defined to expose the intersection of Pods running across both these versions because it is not defined in terms of the ReplicaSets or specific Pods but by user-defined labels (“selectors”) that are applied by you when the Pods are created.

That’s very powerful!

B. Use Jsonnet, possibly Ksonnet

Two challenges with all (!?) structured formats (YAML, JSON, XML, CSV ;-) are self-references and variables. As you craft marvellous.yaml specifications for your Kubernetes deployment, you’ll encounter this problem readily. You’ll find yourself using literals (e.g. for image names and digests) and, even with the esteemed Deployments, repeating names and selectors.

There’s no solution to these issues if you restrict yourself to YAML and JSON. Google created Jsonnet in part to address these problems. The clever Heptio folks have extended Jsonnet to Kubernetes with… Ksonnet.

Both are templating languages that address the problems outlined above (and more). I encourage you to consider Jsonnet. As with my recommendation to consider using jq, learn Jsonnet once and apply it everywhere you use JSON. Ksonnet is specific to Kubernetes and — in my limited (!) experience — I found the benefits gained from this specificity to not be outweighed by the learning-curve.

C. YAML or JSON

Kubernetes treats YAML and JSON mostly equally. Personally I find YAML preferable for configuration files that I kubectl --apply because YAML is more succinct than JSON. Though I find YAML more difficult to write.

However, when it comes to understanding structure and parsing, I prefer to --output=JSON and also because--output=JSONPATH. I’m a huge Golang fan but Go templates aren’t as intuitive and I don’t use them.

minor insight: YAML is a superset of JSON (link)… wait! what?

D. Downward Dog API and config

The “Downward API” to give it its correct albeit no-less confusing name, is a facility in Kubernetes by which Pods can gain a sense of the cluster around them. I assume, the normal flow is from the outside world up into the Pod and its containers *but* there are times when it’s useful for a container (!) to gain information on its environment, e.g. its Node name|IP, the Pod’s name(space)|IP.

The Downward API values are presented to the container through environment variables. Environment variables are used (as elsewhere) to provide config or other state to containers. One very nice consequence of using environment variables and the implementation of the Downward API (with a minor caveat) is that the container remains decoupled from Kubernetes.

Here’s an example from my buddy — Sal Rashid — that uses the Downward API to gather Node and Pod state and present these to the user:

https://github.com/salrashid123/istio_helloworld/blob/master/all-istio.yaml

NB See the sections beginning at lines 76, 80, 84, 88 where Pod name, namespace, IP and Node name are provided at runtime by the Downward API to the container named myapp-container.

The Downward API is the only, practical way to gather this data for a container. So it’s more of an “only practice” rather than “best practice”.

In many of my posts, as I build solutions for Kubernetes, I test the process locally and outside of a container, then in a container (specifying environment variables), then on a Kubernetes cluster. The containerized mechanisms are consistent (see below) even though one is usually running on Docker and one on Kubernetes.

In a post I wrote recently on Google’s Container-Optimized OS, I demonstrate a container running locally under Docker, remotely under Container-Optimized OS and then on Kubernetes.

Here’s running under Docker locally. Notice how the environment variables ( --env)are used to provide config to gcr.io/${PROJECT}/datastore.

docker run \
--interactive \
--tty \
--publish=127.0.0.1:8080:8080 \
--env=GCLOUD_DATASET_ID=${PROJECT} \
--env=GOOGLE_APPLICATION_CREDENTIALS=/tmp/${ROBOT}.key.json \
--volume=$PWD/${ROBOT}.key.json:/tmp/${ROBOT}.key.json \
gcr.io/${PROJECT}/datastore

Here’s the same result wrapping the deploying into the creation of a Container-Optimized VM. This time check out the values provided to the container-env flag:

gcloud beta compute instances create-with-container ${INSTANCE} \
--zone=${ZONE} \
--image-family=cos-stable \
--image-project=cos-cloud \
--container-image=gcr.io/${PROJECT}/${IMAGE}@${DIGEST} \
--container-restart-policy=always \
--container-env=\
GCLOUD_DATASET_ID=${PROJECT},\
GOOGLE_APPLICATION_CREDENTIALS=/tmp/${ROBOT}.key.json \
--container-mount-host-path=\
mount-path=/tmp,\
host-path=/tmp,\
mode=rw \
--project=${PROJECT}

And, lastly, here’s the YAML snippet of the Deployment for Kubernetes:

containers:
- name: datastore
image: gcr.io/${PROJECT}/datastore
imagePullPolicy: Always
volumeMounts:
- name: datastore
mountPath: /var/secrets/google
env:
- name: GOOGLE_APPLICATION_CREDENTIALS
value: /var/secrets/google/datastore.key.json
- name: GCLOUD_DATASET_ID
value: ${PROJECT}
ports:
- name: http
containerPort: 8080

I find thus use of environment variables for config to be rather clumsy. There’s no clear intentional binding of specific environment variables to specific processes and you only realize they’re not configured correctly when things break. It’s easy to imagine environment variables conflicting in a non-containerized environment although this is a lesser problem with containers because, as above, we’re explicitly setting values for a specific container.

All that said, using environment variables this way is the best practice.

E. Sidecars and why Pods aren’t always synonymous with containers

5 cl cognac
2 cl triple sec
2 cl lemon juice
Preparation Pour all ingredients into cocktail shaker filled with ice. Shake well and strain into cocktail glass.

Much of the time, you’ll create Kubernetes Pods that contain single containers and you’ll wonder why there’s all the overhead of a Pod when you only need one container. Pods are more analogous to a host environment that can run many containers. There are many times that you’ll then consider running multiple containers in a Pod…

…and only one time that you should :-)

Probably more than one, but let’s stick to just one time.

The anti-pattern (don’t do this) is to envisage your current configuration (let’s assume a web-server and a database backend) and jam both into a Pod. This is *not* a good idea *unless* each web-server instance must be inextricably and forever joined to a specific database instance. This is unlikely.

What’s more likely is that your web-server instances should scale by the aggregate frontend load and your database instances should scale (independently of this and) based upon their aggregate ability to deal with the frontend load. When you see aggregate, think Service and when you think Service, please try to envisage an arbitrary number of Pods (because it matters for your bill but, for most other purposes, it doesn’t matter how many Pods are needed as long as the number is just-right for serving the workload).

When should you consider multiple containers per Pod? The one time when this *always* makes sense is when you wish to complement, extend or enrich the behavior of the primary Pod in a container. Let’s revisit the web-server and database example from above. In this scenario, hopefully you’re now convinced that you‘ll be deploying two Services (and two Deployments), one for the frontend and one for the backend.

It is a good and common practice to front your web-server instance itself with a reverse-proxy. Commonly this would be either Nginx or HAProxy and it is becoming increasingly common to use Envoy (I recommend, if you’re looking at proxies, to consider Envoy; see #F Istio). A reverse-proxy provides consistency (we only use e.g. Envoy) even if you use different web-servers (e.g. Apache, Tomcat w/ Java etc.), even if you have a mix of HTTP, gRPC, Web Sockets etc. traffic., even if you wish to direct some traffic to your web-server and some traffic to a cache (e.g. Varnish).

In all of the previous scenarios, it would make sense to employ the “sidecar” model. In this model, the primary container (your web-server) has ancillary, complementary containers (Envoy proxy, Varnish cache etc.). These must be tightly-coupled to a specific web-server instance *and* functionally, the combination is the “unit”.

It is very common to see logging, monitoring, trace and other infrastructural components delivered as sidecars too. A motivation here is to separate concerns. To provide developers with a consistent requirement that produces code that is ‘manageable’ and provide SRE with the flexibility to choose preferred tools knowing that all code across the fleet will log, emit metrics, be traceable, apply auth consistently etc. This is a pattern which forms the foundation of service meshes (see #F Istio). This is the final best, albeit nascent practice.

F. Use Istio

Use Istio carefully.

Istio (and other service meshes) are relatively nascent technologies born from companies (including Google) that have run containers at massive scale. Service meshes trivially place a universal (in Istio’s case, Envoy) proxy in every Pod in every Deployment in every Namespace in every cluster.

The result is a consistent management substrate that permits loose coupling of management (we’ll use Stackdriver Trace today but there’s plan to migrate to Jaeger, we’ve rolling our Prometheus monitoring) and control services (we know all our services are secure, we’re routing 10% of traffic to the canary builds of services A,B and C).

I advise “carefully” because these technologies are new, have rough-edges and are evolving at a rapid clip. But, the advantages (flexibility, agility, future-proofing) the methodology provides you likely far outweighs the costs. Most importantly, use service-mesh as your model with Kubernetes even if you don’t yet wish to adopt one of the service-mesh technologies.

That’s all, folks!