Customizing Kubernetes Logging (Part 1)

One of the most powerful tools at your disposal when debugging applications in production are the logs. They’re very easy to create, give you the freedom to output whatever you want, and relatively easy to use. However, every developer likes their own output format, which can make consuming logs in a structured or consistent way difficult for those responsible for troubleshooting or analytics across services, applications, and even Kubernetes clusters. Let’s take a very simple example — outputting a value.

I decided to use this opportunity to expand my meager Python skills and have created a simple app that does very little other than randomly generating logging messages. Specifically, it outputs a simple colon-delimited message that looks like this (as shown in the default Stackdriver Logging viewer without any modifications):

logging.error(“myError:” + randomString)

or

logging.warn("myWarning:" + randomString)

The actual message that I care about has two parts — the part before and after the colon. Specifically, I would like to know whether there is a difference in values for errors and warnings! Ideally, I just want to send all the logs to BigQuery and do my analysis there — but the entire payload is in a single field called textPayload, and my string parsing skills in SQL … don’t exist. So now what?

Well, thankfully, Stackdriver has recently started supporting structured logging, and you can read all about it here. There’s another blog post that does a great job of describing the overall value of structured logging, too. For our purposes here — we want to use structured logging to break up a single line into multiple fields using regular expressions. The bad news — the existing instructions are only for the logging agent running on virtual machines. Let’s see if we can make this work on Kubernetes?

0 — Set up cluster and deploy (“new”) logging

Deploying Stackdriver logging on a GKE cluster is easy and well documented here. I won’t rehash the instructions here, but I followed them precisely, and I suggest you do the same. I’m going to assume that you have created your cluster and set up your kubectl context and credentials from this point on.

1 — Deploy the application

Now that we have a cluster, we need something running on it to generate the logs of the kind I’m discussing here. I’ve made my (please don’t judge me too harshly) code available on GitHub. Let’s deploy it to our cluster.

First, we’ll clone the repo:

$ git clone https://github.com/yuriatgoogle/python-logging
$ cd python-logging

Next, we’ll build the container image using Cloud Build (after setting the project_id variable):

$ gcloud container builds submit --tag gcr.io/${project_id}/python-logging:latest

and validate that it worked by checking the Build History in Cloud Build -> Build History:

It worked!

Now, we are ready to deploy. I’ve provided the (very simple) .yaml file I’m using in the repo.

$ kubectl create -f ./python-logging.yaml
deployment "python-logging" created

Now, let’s make sure we can access the deployment by creating a load balancer in front of it and testing it:

$ kubectl expose deployment python-logging --type=LoadBalancer
service "python-logging" exposed
$ kubectl get services
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.47.240.1 <none> 443/TCP 47m
python-logging LoadBalancer 10.47.240.242 <your IP> 5000:32361/TCP 32s
$ curl http://<your IP>:5000
myDebug:4903403857

At this point, we can look at the logs to confirm the problem we’re looking to solve.

Select the logs
Everything is in textPayload

2 — Deploy Own Custom Logging

Now comes the fun part. We’re going to modify the configMap for the Stackdriver Logging deployment to include a new filter pattern to break up our log entry into separate fields.

First, let’s see what our Logging Agent deployment has created:

$ kubectl get ns
NAME STATUS AGE
default Active 11d
kube-public Active 11d
kube-system Active 11d
stackdriver-agents Active 3d
$ kubectl get cm -n stackdriver-agents
NAME DATA AGE
cluster-config 2 3d
google-cloud-config 1 3d
logging-agent-config 4 3d
$ kubectl get ds -n stackdriver-agents
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
stackdriver-logging-agent 3 3 3 3 3 <none> 3d
stackdriver-metadata-agent 3 3 3 3 3 <none> 3d

Now, we need to get the .yaml file that will let us modify this configuration. It’s available in the public Stackdriver GitHub repository here. As per the instructions there, you’ll want to recompile the agents.yaml file after making changes to the logging-agent.yaml file.

Another option would be to simply pull down the config file from the deployment by doing something like this:

$ kubectl get cm logging-agent-config -n stackdriver-agents -o yaml >> cm.yaml

and the deploymentSet:

$ kubectl get ds stackdriver-logging-agent -n stackdriver-agents -o yaml >> ds.yaml

This could potentially cause issues if the configuration doesn’t “round-trip” well, so my recommendation would be to use the files from the repository, instead.

We won’t need to modify the deploymentSet at all, but we will change the configMap to include a new filter to process our log entries. Modify the cm.yaml file to add the filter for Python output under `containers.input.conf: |-` (add the 3 lines in bold).

<filter reform.**>
@type parser
format multi_format
<pattern>
format /^(?<severity>.*):(?<logger>.*):(?<outputName>.*):(<outputValue>.*)/
</pattern>

<pattern>
format /^(?<severity>\w)(?<time>\d{4} [^\s]*)\s+(?<pid>\d+)\s+(<source>[^ \]]+)\] (?<log>.*)/
</pattern>
<pattern>
format none
</pattern>
reserve_data true # note that this doubles the size of the log, as both parsed and unparsed data would be kept. Set to “false” if you only care about the unparsed data.
suppress_parse_error_log true
key_name log
</filter>

Create the new configMap and redeploy agents:

$ kubectl replace -f ./cm.yaml --force && kubectl delete pods -n stackdriver-agents -l app=stackdriver-logging-agent

That’ll take about a minute. Once it’s done, you can try to refresh your application URL and check Stackdriver Logging again. Your logs should now look like this — note that we now have a jsonPayload field with the values we care about separated out.

That’s so much better!

3 — Now What?

So, our logs are now structured! What does this actually let me do? Well, let’s export these to BigQuery and see what we get there.

Two things to note there:

a. A dataset and table have been created — in my case, the table was created for the log and named with today’s date.

b. each of the fields in jsonPayload is a separate column in the table!

Now, we can do all sorts of fun things! I’m just going to show a basic BigQuery query to average all the values when the message is just a warning:

You can surely take it from here, but I wanted to show how it’s possible to customize the logging configuration on a GKE cluster and use Stackdriver Logging to produce structured logs!

In Part 2, my plan is to revisit this basic approach and investigate how to

  1. Deploy using a Helm chart
  2. Manage filters across a larger fleet of clusters and deployments

Wish me luck!

Resources

K8s documentation: https://kubernetes.io/docs/tasks/debug-application-cluster/logging-stackdriver/#deploying

Solution: https://cloud.google.com/solutions/customizing-stackdriver-logs-fluentd

Fluentd configurations: https://github.com/GoogleCloudPlatform/container-engine-customize-fluentd/tree/master/kubernetes

Stackdriver structured logging: https://cloud.google.com/logging/docs/structured-logging

Customizing Kubernetes monitoring: https://cloud.google.com/monitoring/kubernetes-engine/customizing