New Stackdriver Monitoring for Kubernetes (Part 1)

Update (3/1/2018) — you can now watch me walk through this here:

Update (10/16/2018)- the instructions for creating a cluster with the new monitoring functionality and for updating an existing cluster have been published as this tooling is now in beta. You can find installation instructions here.

New and shiny!

As you might have seen, the Stackdriver team announced brand-new support for Kubernetes monitoring at Kubecon a couple of weeks ago. Obviously, I couldn’t just let this go by without exploring it to the best of my ability. There are 3 main things I wanted to see at work — monitoring workloads on GKE clusters, monitoring non-GKE clusters, and the new Prometheus integration. Let’s get into the first part — monitoring GKE clusters!

Thankfully, the instructions in the documentation are pretty clear.

Monitoring GKE (alpha) clusters

That looks different!

At the time of writing, this support is only available for “alpha” clusters, as the cluster needs to be on Kubernetes 1.10. So, let’s create our alpha cluster first. Here’s the gcloud command from the docs — the values I used are after the #.

CLOUDSDK_CONTAINER_USE_V1_API=false \
CLOUDSDK_API_CLIENT_OVERRIDES_CONTAINER=v1beta1 \
gcloud alpha container clusters create [CLUSTER_NAME] \ #(alpha-cluster)
--zone=[ZONE_WITH_1.10.2-gke.0] \ #us-west1-a
--project=[PROJECT_ID] \ #stackdriver-kubernetes
--cluster-version=1.10.2-gke.0 \
--enable-stackdriver-kubernetes \
--enable-kubernetes-alpha

And voila!

We have an alpha cluster!

That’s actually all! Let’s first verify the installation on the cluster itself.

It’s alive!

We have the monitoring and logging agents, the metadata agent, and heapster all running in the stackdriver-agents namespace. Let’s contrast that with an “older” cluster I have running,

where there’s no stackdriver-agents namespace at all,

and fluentd is running in the default namespace. So — the monitoring infrastructure is different, and that’s all fine. But what are we getting from this? Let’s go into Stackdriver and find out.

The first and obvious thing is that we now have additional resources available called “Kubernetes Engine V2 BETA”. I think I might have triggered a bug here, as it should only appear once — but both of these take me to the same screen.

As you can see, my “logging-cluster” is using the “original” Kubernetes monitoring and doesn’t appear on this list. However, my alpha-cluster is here. Let’s check it out!

Drilling down from Infrastructure view

By starting on the Infrastructure tab, we can go down the cluster -> node -> pod -> container route.

Container view

We can select a container to get more information about it, like CPU and memory utilization, and even pull up the logs.

Workloads view

If we prefer, we can take the Workloads path, which will let us go cluster -> namespace -> workload -> pod -> container.

Finally, we can take the Services path to go cluster -> namespace -> service -> pod -> container.

This is all very cool — and it works without having to do any of the instrumentation manually! Come back for parts 2 and 3 when I’ll try to make this work on Compute Engine and check out the Prometheus integration!