Imperative vs. Declarative — a Kubernetes Tutorial

Adrien Trouillaud
PayScale Tech
Published in
13 min readApr 15, 2019

--

There are two basic ways to deploy to Kubernetes: imperatively, with the many kubectl commands, or declaratively, by writing manifests and using kubectl apply. The former is good for learning and interactive experimentation (analogous to a programming language’s REPL); the latter is good for reproducible deployments, i.e., for production — though you may still use some kubectl commands for debugging in production.

The Data Science team at PayScale was an early adopter of Kubernetes. I first wrote this tutorial to onboard fellow team members and help spread the use of Kubernetes to the rest of the company. We’re hoping it will help others too.

Prerequisites

This tutorial assumes you have access to a cluster. Nowadays, you can easily run a single-node cluster on your machine, or create a multi-node cluster in your favorite cloud. Make sure kubectl is installed on your machine and properly configured. The following command should succeed:

kubectl cluster-info

You should also have enough permissions in your default namespace (which may be different than thedefault namespace). The edit role should be enough. If you created a cluster for this tutorial, you’re likely an admin, so you're good to go. Otherwise, somebody may have prepared the ground for you. If you don’t understand this paragraph, just keep going :)

Optional: If you decide to build and push a custom container image (see below), rather than pull a public one, we assume you and the cluster have access to a container image registry. Again, somebody may have prepared the ground for you. GKE works well with GCR, AKS works well with ACR, etc. Otherwise, DockerHub and Quay are popular options, but if you don’t make your image public, you may need to configure the default service account in your namespace with an image pull secret.

Optional: Build and Push

Whether we deploy imperatively or declaratively, we need a container image. You can skip this part and use an existing image if you’re in a hurry, e.g., nginx, but some of the steps in this tutorial are tailored for the application we’ll build in this section. Also, if you want to learn how to containerize an application, read on.

For the purpose of this tutorial, we’ll start with a simple web app’s source code. Here’s a sample Node.js app based on an example from the Node.js documentation (feel free to write an equivalent in your favorite programming language). In an empty folder, copy the code below to a file named app.js:

// app.jsconst http = require('http');
const os = require('os');
const ip = '0.0.0.0';
const port = 3000;
const hostname = os.hostname();
const whoami = process.env['WHOAMI'] || 'Anonymous';
const server = http.createServer((req, res) => {
res.statusCode = 200;
res.setHeader('Content-Type', 'text/plain');
res.end(`Hi, I’m ${whoami}, from ${hostname}.\n`);
});
server.listen(port, ip, () => {
console.log(`Server running at http://${ip}:${port}/`);
});

We made a few changes to the base example:

  • Important! Serve on 0.0.0.0 rather than 127.0.0.1. The latter is for loopback only and this app will listen to incoming requests on a Cluster IP (captured by 0.0.0.0).
  • We also changed the “Hello World” message to include two variables: the hostname, which will be an indicator of which replica is responding, and the WHOAMI environment variable, to be set at deploy time, defaulting to “Anonymous”.

If you’ve installed Node.js, you can test the app locally:

node app.js # then open http://localhost:3000

Note: if you’re rolling your own, don’t bother with TLS termination or authentication, because those can be handled at the cluster’s edge by, e.g., Ambassador, and even between apps within the cluster in a service mesh like Istio, if you decide to go zero-trust.

Let’s package this app as a Docker image. Copy the code below to a file named Dockerfile:

# Dockerfile
FROM node:8
COPY app.js .
ENTRYPOINT [“node”, “app.js”]

In the same folder, run:

docker build -t myrepo:mytag .

Depending on your container image registry, replace myrepo with something like gcr.io/project-name/image-name on GCR, or user-name/image-name on DockerHub (the default registry). Replace mytag with anything but latest. If you’re taking this tutorial with a group of people, make sure you use different tags. Don’t forget the final dot, making the current folder the build context.

Finally, push the image to your repository (from which Kubernetes will pull):

docker push myrepo:mytag

Imperative Configuration

Run

The shortest way to deploy to Kubernetes is to use the kubectl run command. Replace myapp with something unique if you’re taking this tutorial with other people and sharing a namespace; replace myrepo:mytag with what you picked in the previous step (or just use nginx):

kubectl run myapp --image myrepo:mytag --replicas 2

The command may look familiar if you’ve ever used docker run to start a container locally, but the similarity stops here. Here’s what happens under the hood (also illustrated in the diagram below):

  1. kubectl translates your imperative command into a declarative Kubernetes Deployment object. A Deployment is a higher-level API that allows rolling updates (see below).
  2. kubectl sends the Deployment to the Kubernetes API server, kube-apiserver, which runs in-cluster.
  3. kube-apiserver saves the Deployment to etcd (a distributed key-value store), which also runs in-cluster, and responds to kubectl.
  4. Asynchronously, the Kubernetes controller manager, kube-controller-manager, which watches for Deployment events (among others), creates a ReplicaSet from the Deployment and sends it to kube-apiserver. A ReplicaSet is a version of a Deployment. During a rolling update, a new ReplicaSet will be created and progressively scaled out to the desired number of replicas, while the old one is scaled in to zero.
  5. kube-apiserver saves the ReplicaSet to etcd.
  6. Asynchronously, kube-controller-manager, creates two Pods (or more if we scale out) from the ReplicaSet and sends them to kube-apiserver. Pods are the basic unit of Kubernetes. They represent one or several containers sharing a Linux cgroup and namespaces.
  7. kube-apiserver saves the Pods to etcd.
  8. Asynchronously, the Kubernetes scheduler, kube-scheduler, which watches for Pod events, updates each Pod to assign it to a Node and sends them back to kube-apiserver.
  9. kube-apiserver saves the Pods to etcd.
  10. Finally, the kubelet that runs on the assigned Node, always watching, actually starts the container.

Note: the controller, scheduler and kubelet also send status information back to the API server.

In summary, Kubernetes is a CRUD API with control loops.

Get, Describe

kubectl run created a Deployment, which rolled out a ReplicaSet of Pods. Where are they? Use kubectl get to list all Deployments, ReplicaSets and Pods in your default namespace:

kubectl get deployments # plural or singular, or deploy for short
kubectl get replicasets # or rs
kubectl get pods # or po

To get a single object, add its name as an argument, e.g.:

kubectl get deployment myapp

To see the object’s state as saved in etcd, use the --output option (or -o):

kubectl get deployment myapp -o yaml

To gather more details, including recent Events (e.g., errors) related to the object, use the kubectl describe command:

kubectl describe deployment myapp

Go ahead and run the same commands on the ReplicaSet and Pods.

Label

Labels are very useful. A label is a key-value pair of strings. All Kubernetes objects can be labeled and those labels can be used as selectors. The kubectl run command added the run=myapp to our Deployment (and the controlled ReplicaSet and Pods) automatically. You saw it in the YAML outputs and descriptions. To see those labels in the regular table output, use the --show-labels option:

kubectl get deployments --show-labels
kubectl get replicasets --show-labels
kubectl get pods --show-labels

If you know what keys you’re interested in, use the --label-columns option (or -L) to show the values as columns, e.g.:

kubectl get replicasets -L run

Most importantly, to filter objects by label, use the --selector option (or -l), e.g.:

kubectl get pods -l run=myapp

You can also add labels manually, e.g.:

kubectl label deployment myapp foo=bar

Delete

The Kubernetes API is fundamentally declarative, which means that the controllers always work to reconcile the observed state with the desired state. Therefore, if we delete a Pod, the ReplicaSet controller will create a new one to replace it, to maintain the desired replica count. See for yourself:

kubectl delete pods -l run=myapp
# wait a bit
kubectl get pods -l run=myapp

Notice that the new Pods have different generated names than the old ones (the random suffix part).

The same is true for the ReplicaSet. If we delete it, the Deployment controller will create a new one to replace it. But if we delete the Deployment itself, our intent is really to delete the app; nothing controls the Deployment but us.

Port-Forward

So far, we’ve seen the objects created by Kubernetes in response to kubectl run. Kubernetes indicates that the Pods are ready, simply because the processes in the containers are running. We could define a more significant liveness probe, but for now, let’s make sure our app works as expected by manually testing it.

kubectl port-forward proxies a local port (on your machine) to a Pod’s port, via the API server (secured with TLS over the Internet, between the proxy on your machine and the API server). It is a fairly flexible command. If you provide a Pod name and port as arguments, kubectl will simply port-forward to that Pod’s port. But because Pod names are usually generated, including random characters, it is often more convenient to provide the kind/name pair of a controlling object, e.g., a Deployment; kubectl will select a matching Pod for you:

kubectl port-forward deployment/myapp 3000

Then simply open http://localhost:3000/ in a browser (or curl in a different terminal) to see the “Hello World” message. The hostname shouldn’t change when you reload because we’re only port-forwarding to one Pod. Also, note that we didn’t set the WHOAMI environment variable when we ran kubectl run; we could have, with the --env option, but we’ll use this omission as an opportunity to learn how to edit a Deployment with kubectl set and kubectl patch in a later step.

To stop port-forwarding, press Ctrl+C. If the connection is inactive for some time, it will be closed automatically.

Note: you can use a different port locally, e.g., if the Pod’s port is taken on your machine, e.g.:

kubectl port-forward deployment/myapp 5000:3000

Expose

Our app is working, but other deployed apps (their Pods) shouldn’t have to use port-forward. Pods have Pod IPs and DNS entries. We could use those, but pods are short-lived, their IPs and names keep changing. Also, we need a way to talk to Deployments (or other kinds of Pod groups), not just single Pods. We need service discovery.

A Kubernetes Service routes traffic to a set of Pods matching the Service’s label selector. If multiple Pods match the selector, they all listen and receive traffic. To expose a Deployment, we can simply use the Deployment’s selector (in our case, run=app) as the Service’s selector.

The kubectl expose command automates Service creation from a Deployment, ReplicaSet, or even another Service or a single Pod. It looks up the selector from the given object automatically, unless otherwise specified with options:

kubectl expose deployment myapp --port 80 --target-port 3000 # let’s expose on the standard HTTP port, 80, which we couldn’t use on our dev machine

Check that the Service was created:

kubectl get service # or svc

You can see the Pod IPs of the listening Pods in the description of the Service, or in the companion Endpoints object (controlled by the Service). This can be useful to debug networking issues:

kubectl describe service myapp
kubectl get endpoints myapp

Let’s spin up a temporary Pod with an interactive terminal to actually call the Service:

kubectl run mytest -it --rm --image alpine # Alpine is tiny# inside mytest:apk add curl
curl http://myapp # “Hello World…”
exit

That deserves a few explanations: kubectl run created a Deployment as before, allocated a TTY (-t or --tty), kept stdin open (-i or --stdin), attached to it, and deleted the Deployment when we exited (--rm). Those options may look familiar as they exist for docker run as well.

Logs

If our app had issues, we may want to check its logs. Container logs are stored by the container runtime (typically the Docker daemon) on the nodes and automatically rotated, by default, when the log file exceeds 10MB. We could set up a logging agent to push the logs to a persistent backend; just remember kubectl logs only sees the logs left on the nodes.

Just like with port-forward, we can either request the logs of a specific Pod by supplying its name, or request the logs of a group of Pods by supplying the kind/name pair of a controlling object (e.g., a Deployment):

kubectl logs deployment/myapp

In our case, not much has been logged. However, a production app may be more verbose, especially when you actually need to check the logs (so many errors!). You can limit the output with the --since (e.g., 1m), --since-time (e.g., 2018–11–01T16:30:00) and --tail (e.g., 20 lines) options. Other useful options include --follow (or -f), --previous (when containers keep crashing), and --timestamps (if your app doesn’t log timestamps already). After that, it can be handy to write the output to a file and grep your way around it, e.g.:

kubectl logs deployment/myapp --since 5m > log.txt
grep error log.txt
# more grep

Exec, Copy

kubectl offers a couple of other tools to help debug running containers. kubectl exec executes a command in a container, and kubectl cp copies files and directories to and from containers. The two commands take explicit Pod names rather than Deployment names. Here’s a useful trick leveraging the --output jsonpath option to store a Pod name in a shell variable:

POD_NAME=$(kubectl get pods -l run=myapp -o jsonpath={.items[0].metadata.name})

We can then use that variable in the other commands:

kubectl exec $POD_NAME -it sh # opens an interactive shell in the container# inside the container, just a few examples:node --version
echo $WHOAMI
exit
# back to your machinekubectl cp $POD_NAME:app.js remote-app.js # again, just an example (there are better means to know what code currently runs in production)

Set, Scale, Patch

We still need to set the WHOAMI environment variable. We can do that with kubectl set or kubectl patch. While we’re at it, let’s also learn how to watch resource changes with kubectl get --watch (or -w), and observe rolling updates in real time:

kubectl get deployment myapp -w# in a second terminal:kubectl get replicasets -w -l run=myapp# in a third terminal:kubectl get pods -w -l run=myapp# in a fourth terminal:kubectl set env deployment/myapp WHOAMI="HAL 9000"

In the first three terminals, observe the creation of a new ReplicaSet, and the orchestrated creation/deletion of new/old Pods. You’ll also see lines corresponding to status changes (look at the numbers of Pods in the DESIRED, CURRENT, UP-TO-DATE, AVAILABLE/READY columns). You’ve just witnessed a rolling update. Let’s scale out our Deployment before we roll out another change:

kubectl scale --replicas 3 deployment myapp

The kubectl set command is limited to setting environment variables, images, resource requests/limits, selectors, ServiceAccounts, and subjects of RoleBindings (cf. Role-Based Access Control, or RBAC, which is outside the scope of this tutorial). The kubectl patch command is more general. It accepts JSON or YAML to replace or merge specific fields. As a simplistic example, we’re going to bypass the scheduler and force all Pods to run on one node:

First, see that our app’s replicas are currently running on different nodes:

kubectl get pods -l run=myapp -o wide # check the NODE column

Let’s just pick one and patch our Deployment with it:

NODE_NAME=$(kubectl get pods -l run=myapp -o jsonpath={.items[0].spec.nodeName})
kubectl patch deployment myapp -p '{"spec":{"template":{"spec":{"nodeName":"'$NODE_NAME'"}}}}'

Now all the Pods should be assigned to the same node:

kubectl get pods -l run=myapp -o wide

Note: there are smarter ways to influence the scheduler, like labeling certain nodes and using a node selector, affinities and anti-affinities, taints and tolerations, etc.

Declarative Configuration

Congratulations! If you’ve made it this far, you’ve learned how to tell Kubernetes what to do. However, the power of Kubernetes is in its declarative API and controllers. You can just tell Kubernetes what you want, and it will know what to do. So, except for the read-only get, describe and logs, the debugging port-forward, exec and cp, and delete (it’s easier to replace Pods than fix them), you will rarely, if ever, use the other commands that we saw in the previous section (sorry, but they were useful to introduce you to some important concepts). Most of the time, you’ll just use kubectl apply and YAML (or JSON) manifests of the state to be saved by Kubernetes in etcd.

To scaffold manifests from running objects, you can simply save the output of kubectl get -o yaml --export:

kubectl get deployment myapp -o yaml --export > myapp-deployment.yaml
kubectl get service myapp -o yaml --export > myapp-service.yaml
# replicasets and pods are controlled and don’t need manifests (the deployment spec contains a pod template)

A manifest actually doesn’t need all of the saved state. Some of it is added by Kubernetes. The--export option removes the status section and some metadata (UID, creation timestamp, etc.), but you may want to remove even more, e.g., default values. We just need enough to specify what we want.

The documentation provides basic examples for Deployments and Services. In our case, the manifests boil down to this:

# myapp-deployment.yamlapiVersion: apps/v1
kind: Deployment
metadata:
name: myapp
spec:
replicas: 3
selector:
matchLabels:
run: myapp
template:
metadata:
labels:
run: myapp
spec:
containers:
- name: myapp
image: psdocker/myapp:mytag
env:
- name: WHOAMI
value: "HAL 9000"
# myapp-service.yamlapiVersion: v1
kind: Service
metadata:
name: myapp
spec:
selector:
run: myapp
ports:
- port: 80
targetPort: 3000

To prove that they work, let’s delete everything first, then apply our manifests:

kubectl delete deployment myapp
kubectl delete service myapp
kubectl apply -f myapp-deployment.yaml -f myapp-service.yaml

If we hadn’t deleted the Deployment and Service, they would have been updated to match the manifests. The kubectl apply command is idempotent. We can reuse it after modifying the manifests. For example, try editing the number of replicas and re-run the last command.

If you’re unsure about the schema or you’d like to explore your options, the Kubernetes API reference is a useful companion.

For complex applications, with several environments, raw manifests can become difficult to manage. Tools like kustomize (now part of kubectl since v1.14), Helm, or Jsonnet can help. Jesse Suen wrote a great comparison. To oversee the entire build and deploy process, and inject tagged image names into manifests, skaffold is another useful tool.

Going Further

In this tutorial, we’ve worked with the basic building blocks of stateless apps on Kubernetes: Deployments, ReplicaSets, Pods, Services. There are alternatives to Deployments (not all apps support rolling updates): StatefulSets, DaemonSets, Jobs, CronJobs, etc. ConfigMaps and Secrets are useful to store application configuration. Also, we’ve only used Deployments in the most basic manner. In production, you should set CPU and memory requests and limits, and you may be interested in autoscaling, among other things.

There are a few good books to learn Kubernetes. The only one I’ve read and don’t hesitate to recommend is Kubernetes in Action (Manning). Of course, the official documentation is a great reference and rightly appears at the top of most Google searches.

--

--

Adrien Trouillaud
PayScale Tech

EM at Datadog, ex-CEO/founder at Admiralty, ex-PayScale