Google Cloud Platform — Kubernetes Logs to StackDriver Graph — Epiphanies

John Jung
johnjjung
Published in
5 min readSep 26, 2019

This is just to connect the dots going from a log that you see in Stackdriver Logging to Stackdriver Monitoring Dashboard with pretty looking Graphs.

We use Kubernetes on Google Cloud Platform’s GKE. And if you’ve got a crap load of pods, you need a centralized place to see those logs.

We used to use elasticsearch-logstash/fluentd-kibana (ELK), but switched stackdriver recently, but the concepts here should apply for both platforms.

How do you go from a stdout log output that’s on a single pod to a graph?

First in whatever platform you’re using, make sure you can see the logs on stackdriver logging or kibana.

The most confusing part on GCP Logs Viewer is to find the logs. For GKE Kubernetes Clusters it’s finding the right logs, because everything is logged.

GKE Container -> {{your k8s cluster name}} -> {{ k8s namespace }}

Even if you just find this out, it’s awesome because you can do two very powerful things: see all your pod’s logs in a centralized place and be able to filter and search them. This is true of ELK stash too, I just don’t have screenshots.

The above shows inside the sandbox-cluster GKE cluster, default namespace, autographer deployment. The awesome thing is you can filter your deployments, statefulsets, etc…

You can also search for very specific keywords and errors.

Here’s the first epiphany :Logging in JSON allows you to filter and create metrics around your logs.

python

print('{"error_message": "this was some error", "latency_in_ms": 123132}')orlog.info(json.dumps({"error_message": "this was some error", "latency_in_ms": 123132}))

javascript

console.log({"error_message": "this was some error", "latency_in_ms": 123132})ordebug({"error_message": "this was some error", "latency_in_ms": 123132})orlog.info({"error_message": "this was some error", "latency_in_ms": 123132})

This means in whatever language or stack you’re using, if you log the output as a json, it will go from textPayload to jsonPayload . And when you do this, it becomes really powerful.

Yes, there are plenty of libraries that convert your current logs into json, but you want to really keep this in mind whenever you’re logging something you want to track. For example latency, response times, etc.

This is what it looks like with textPayload
This is what you have with a jsonPayload. Notice that you have the little carrots that you can expand open. This is structured and important for the next part.

It doesn’t really matter what language or stack you’re using, but if you’re already using some library for logging in whatever language or stack they support usually support json. So when you output the logs, make sure you structure your log output so that the metrics you want to be filtered by can be. So this means in your logs, don’t put units because everything will be a string. For example: latency in milliseconds.

Instead of:

{
"latency": "205ms"
}

Do this:

{
"latency_in_ms": 205
}

Stackdriver Logging has documentation for different languages, but like I said at the end of the day, you just want it to be logging json so that it’s structured.

Second Epiphany: Turning your jsonPayload logs into metrics that become a graph on a dashboard

I’ll use the example above because we actually use this. Notice how there are a few attributes that I wanted to track: response_duration as a number, status_code as an int, request_payload which is nested and with whatever strings I want to track.

Let’s say I want to just get the average response_duration or see how many 200 or 400 I get in status_code

They way to do this is to create a metrics. There are two types of metrics in any platform you use: counts or distribution

In the GCP Stackdriver Logging Console, you can create a metrics. Note: you can also filter by labels, clusters, namespaces, deployments, etc….

So when you’re creating the metric now, the difference between a textPayload and jsonPayload really shows because now you’re able to choose a very specific field.

When you’re creating the metrics, add a few labels to the metrics you want to pay attention to so that when you’re searching for them in the graphing part you can easily find them.

Here we can see in the filters the specific metric you want to graph response_duration

As you can see once you have a graph you can Save Chart to a Dashboard

So once you’re in the Stack Driver Monitoring Tab, you can go to Resources -> Metrics Explorer

Here you can search for your metric name then filter them by the attributes you wanted. Above for example is the mean response_duration, grouped by service_name. There’s a boatload of filter syntax you can also use to get the graphs that you want, but that’s for another post.

Here’s just one more complex graph where you can count events per second, as a stacked bar

This one graphs 400 or 500 errors using status_code over time.

Sign up for GCP using the link below for an additional $50! ($350 total for new users).

https://gcpsignup.page.link/Nv9h

--

--