Monitoring Kubernetes Clusters on GKE (Google Container Engine)

Ariel Peretz

Published in

Google Cloud - Community

13 min readMar 27, 2019

Ctrl+Alt+Monitor :)

About this Guide

1. Introduction

The Kubernetes ecosystem contains a number of logging and monitoring solutions. These tools address monitoring and logging at different layers in the Kubernetes Engine stack.

GCP components (compute)
Kubernetes objects (cluster nodes)
Containerized applications (pods)
Application specific metrics (pod running Apache http server serving on port 8081)

This document describes some of these tools, what layer of the stack they address, as well as best practices for implementation including an example from the field, a quick start, and a demo project.

2. Tasks to do

To implement logging and monitoring in Stackdriver for Kubernetes Engine, we recommend the following tasks:

2.1 Monitor

Determine what’s important (metrics, resources, etc.) and decide what to monitor:

What isn’t measured, can’t be monitored. It is important to collect as much information (as many metrics) as is reasonable.
Determining Service Level Indicators and Objectives (SLIs and SLOs) helps frame the monitoring discussion in the context of system reliability; what the users of the system care about.
Avoid the temptation to monitor everything. Monitoring too much is just as dangerous as monitoring too little.
Monitor just enough. Good decisions can’t be made with non-existent information. Make a “data-driven decision”.

2.2 Log

Plan a logging architecture that matches the deployment architecture and organizational responsibilities

Aggregate all Kubernetes Engine cluster metrics to one Stackdriver project?
What additional metrics need to be monitored? What are the key concerns?

2.3 Review

Review planned and existing Kubernetes Engine clusters in GCP;
Ensure logging and monitoring is enabled for each cluster

2.4 Implement

Implement monitoring

3. The What? and How?

To scale an application and provide a reliable service, you need to understand how the application behaves when it is deployed. You can examine application performance in a Kubernetes cluster by examining the containers, pods, services, and the characteristics of the overall cluster. Kubernetes provides detailed information about an application’s resource usage at each of these levels. This information allows you to evaluate your application’s performance and where bottlenecks can be removed to improve overall performance.

3.1 What to monitor

This is potentially the most difficult question in planning operations for cloud workloads, let alone Kubernetes Engine. That said, identifying what to monitor is critical to the success of Kubernetes Engine workloads1 in the cloud. The following table makes some recommendations on a minimum set of important attributes to monitor. While the structure of organizations vary, the table simplifies the question to two perspectives: the application developer/operator and the Kubernetes Engine cluster administrator. The remainder of the paper will focus on these two perspectives.

3.2 How to monitor?

Ideally the metrics being collected and information being displayed is done so in a manner that is convenient and concise. Stackdriver provides a single-pane-of-glass view of metrics, logs, and traces through Kubernetes Engine clusters and workloads.

The following diagram illustrates, at a high-level, the relationship between Kubernetes Engine and Stackdriver. Logs, metrics and other useful information are being sent to Stackdriver from the Kubernetes Engine clusters.

3.3 Monitoring tools

The tools discussed in this document can be found in the following table which highlights the scope of the tool. A detailed description of each tool is found below.

3.3.1 Stackdriver

Stackdriver aggregates metrics, logs, and events from infrastructure, giving developers and operators a rich set of observable signals that speed root-cause analysis and reduce mean time to resolution (MTTR). Stackdriver doesn’t require extensive integration or multiple “panes of glass,” and it won’t lock developers into using a particular cloud provider.

3.3.2 Prometheus

Prometheus, a Cloud Native Computing Foundation project, is a systems and service monitoring system. It collects metrics from configured targets at given intervals, evaluates rule expressions, displays the results, and can trigger alerts if some condition is observed to be true.

3.3.3 Kube-state-metrics

kube-state-metrics is a simple service that listens to the Kubernetes API server and generates metrics about the state of the objects.

3.3.4 OpenCensus

OpenCensus is a single distribution of libraries that collect metrics and distributed traces from application services. OpenCensus provides observability by tracing requests as they propagate through services and captures critical time-series metrics. Metrics can be exported to a number of backends, including Stackdriver.

3.4 Choosing the right tool

3.5 Under the hood (Metrics and aggregating components)

Each node in a Kubernetes Engine cluster runs a process named Kubelet which is the primary process for container orchestration on that node. Embedded in the Kubelet process is a mechanism named cAdvisor. Via cAdvisor, each Kubelet offers an aggregated summary of information to external consumers in the cluster.

The primary consumer is Heapster, which is automatically deployed in a container on an available node of each Kubernetes Engine cluster. Heapster queries the cluster master to identify all the nodes of the cluster, then regularly polls each node’s Kubelet to collect the summary information from the Summary API.

When Prometheus is installed in a Kubernetes Engine cluster, the prometheus-to-sd-exporter component scrapes any endpoint for content in a Prometheus format and forwards it to Stackdriver. This container is co-located with both the FluentD and Event Exporter pods as it’s this component that is responsible for actually moving the bits to Stackdriver.

These sources are scraped by the metrics-server component. metrics-server exposes the master metrics API. The discovery summarizer makes the master metrics API available to external clients.
Heapster becomes the cluster-wide repository for this information. Other services in the cluster can query Heapster and not have to separately poll the Kubelet of each cluster node. For example. Stackdrover queries Heapster for real-time performance metrics. The Horizontal Pod Autoscaler also queries Heapster to determine when to scale pods.

*Starting from Kubernetes 1.8, resource usage metrics, such as container CPU and memory usage, are available in Kubernetes through the Metrics API.

4. A step-by-step guide for logging and monitoring

Stackdriver logging and monitoring are enabled by default when deploying new Kubernetes Engine clusters. A Stackdriver dashboard should be used to visualize metrics generated in a single or multiple Kubernetes Engine clusters.

4.1 Prerequisites

User must be the Owner of the project containing the Kubernetes Engine cluster (for IAM/ resource management reasons).

The following APIs must be enabled:

Stackdriver Logging API
Compute Engine API
Stackdriver Monitoring API
Kubernetes Engine API
Stackdriver Trace API
PubSub API

The following command line tools must be installed on the deployment host (cloud shell can be used and or any other host)

gcloud
git
kubectl
Terraform >= 0.11.7

4.2 Desired outcome

The following table summarizes the outcome of executing the steps in the succeeding sections.

4.3 Deploying a cluster
4.4 Configuring minimum monitoring
4.5 OpenCensus and distributed tracing

4.3 Deploying a cluster

Note: It is strongly recommended that the following be executed in the Google Cloud Console.
Note: For this demo, you can use an existing Kubernetes Engine cluster. However, by instantiating a new Kubernetes Engine cluster, per the gke-tracing-demo example, you’ll be able to observe more interesting behavior.

Logging and monitoring are enabled by default when deploying a Kubernetes Engine cluster. However, if a Kubernetes Engine cluster was not deployed with logging and monitoring, both can be enabled after the fact using the following command:

$gcloud beta container clusters update [CLUSTER_NAME] \ — monitoring-service monitoring.googleapis.com \ — logging-service logging.googleapis.com

2. Verify that logging and monitoring are enabled:
*Cluster logging and monitoring can be verified by visiting the cluster details page for the cluster which should be similar to the following:

*Alternatively, logging and monitoring can be verified with the gcloud command:

$gcloud container clusters describe — zone [ZONE] [CLUSTER] | grep “monitoringService\|loggingService”

With the following output:

loggingService: logging.googleapis.commonitoringService: monitoring.googleapis.com

4.4 Configuring minimum monitoring

The installation of the following metrics components are required, regardless of whether a new cluster was deployed as in section 4.3 or an existing Kubernetes Engine cluster is being utilized.

4.4.1 Install monitoring components

The following steps will deploy a Prometheus server, Prometheus Stackdriver sidecar, and kube-state-metrics into a Kubernetes Engine cluster.
Note: This deployment will only work with legacy Stackdriver monitoring. It will not behave as expected with the beta release of Stackdriver Kubernetes Monitoring.
Note: These steps are not covered by any SLA or deprecation policy.

*Connect to the Kubernetes Engine cluster.

*Ensure your user has the appropriate permissions to deploy Prometheus.

$kubectl create clusterrolebinding cluster-admin-binding — clusterrole=cluster-admin — user=$(gcloud info | grep Account | cut -d ‘[‘ -f 2 | cut -d ‘]’ -f 1)

*Clone the stack-driver-prometheus-sidecar project.

$git clone https://github.com/Stackdriver/stackdriver-prometheus-sidecar.git

*Export the following variables.

$export GCP_PROJECT=…export GCP_REGION=…export KUBE_CLUSTER=…

*Deploy the Prometheus server and kube-state-metrics.

$bash stackdriver-prometheus-sidecar/kube/full/deploy.sh

4.4.2 Observe Stackdriver Dashboards

Create an app-centric dashboard and add the charts defined in section 5.1.
Create a Kubernetes Engine cluster centric dashboard and add the charts defined in section 5.2.
If the steps in section 4.3 were followed, then the trace-demo application will already be deployed. If the trace-demo application is to be run on an existing cluster, the following command can be used:

$kubectl apply -f https://raw.githubusercontent.com/GoogleCloudPlatform/gke-tracing-demo/master/terraform/tracing-demo-deployment.yaml

4. It is important to note two items:

A horizontal pod autoscaler will work only if the CPU resource limit is configured for a container.
kube-state-metrics will collect pod CPU utilization metrics if the CPU resource requests are defined.

If default CPU resource requests and limits defaults are not defined, the following command will set them for the current namespace:

$ echo “apiVersion: v1
kind: LimitRange
metadata:
name: limits
spec:
limits:
- default:
cpu: 100m
defaultRequest:
cpu: 100m
type: Container” | kubectl apply -f -

5. Delete the existing tracing-demo pod to apply the new default CPU limit.

$kubectl delete pods -l app=tracing-demo

6. Configure the trace-demo application to autoscale.

$kubectl autoscale deployment tracing-demo \ — cpu-percent=50 \ — min=1 \ — max=10

7. Test the load and monitor the application:
*Test the load on the application by running a container with a loop that will make HTTP calls against the webapp.

$kubectl run -i — tty load-generator — image=busybox /bin/sh — requests cpu=50mHit enter for command prompt/ #while true; do wget -q -O- http://tracing-demo.default.svc.cluster.local; done > /dev/null

After a few minutes, some interesting data appears in the following charts:

Pod’s Running Total

Pod’s Pending Total

Container CPU Utilization

Pod Autoscaler Current vs. Desired Replicas

Note: If the steps in section 4.3 were followed, it will be obvious that there is an issue. The CPU utilization per container is hovering around 100% even though a target of 50% has been specified. The autoscaler appears to be attempting to scale, but the next replicated pod is stuck in a pending state.

It may be confusing that the Pod Autoscaler Current vs. Desired replicas graph indicates two active replicas when only one is successfully deployed. This graph reflects the current state of the autoscaler where there is one successful replica and one pending replica.

8. The log-based metric created in step 5.2.4 results in a chart illustrating that there is on average one pod scheduling failure due to insufficient CPU resources in the default namespace.

9. This is derived from the Stackdriver “Kubernetes Engine Cluster Operations” logs for the cluster and there should be at least one entry resembling the following:

This shows that the pod that is pending (notice the pod name compared to the chart above) is stuck due to insufficient CPU. The reason is that the steps in section 4.3 deployed a cluster with a single node and without node autoscaling.

10. Create an additional nodepool to allow the trace-demo pod to scale and alleviate the load.

$gcloud container node-pools create scaling-pool \ — cluster tracing-demo-space \ — scopes=default \ — enable-autoscaling \ — max-nodes=6 \ — min-nodes=1 \ — machine-type=n1-standard-2 \ — zone=ZONE

4.5 OpenCensus and distributed Tracing

To observe traces through an application, the application must be appropriately instrumented. There exist a number of SDKs for major programming languages to enable this capability. This section focuses on the OpenCensus SDK for Python. The gke-tracing-demo application has been written with the Flask framework. OpenCensus provides integration with Flask affording an easy way to measure latency in an application.

The gke-tracing-demo application has already been deployed per the steps in section 4.4.2. Simple HTTP GET requests were made against the application’s endpoint generating a light CPU load and a response. The application has more functionality worth exploring.

The following diagram illustrates the application’s architecture. Of note is the fact that in addition to returning an HTTP response, a message is placed in a third service (Cloud pub/sub) Google Cloud PubSub.

Note: Istio is a powerful add-on to Kubernetes Engine and affords similar tracing analysis capabilities, including graphs. Istio provides features beyond the scope of this document. Depending on the features desired, Istio is worth further investigation.

4.5.1 Investigating traces

*In the Google Cloud Console, navigate to the “Trace list” in Stackdriver Tracing.

*Investigating the plot, it’s clear that this application has received considerable traffic (from the load test performed in section 5.3.2).

*Selecting any of the plot point will display the trace timeline in detail.

The top timeline signifies the root span; the HTTP GET request against the URL of the Kubernetes service. The bottom timeline shows the child span; the time spent publishing a message to the PubSub topic. This information is generated from the following code:

@app.route(‘/’)def template_test():[…]
tracer = execution_context.get_opencensus_tracer()
# Trace Pub/Sub call using Context Manager with tracer.start_span() as pubsub_span: pubsub_span.name = ‘[{}]{}’.format(‘publish’, ‘Pub/Sub’) […] […][…]

Beyond the name of the child span being defined, the OpenCensus context manager takes care of measuring the execution time and reporting to Stackdriver.

5. Stackdriver monitoring charts

*Telemetry Data is presented in SD in a SD-dashboard in the forms of Graphs/charts etc..

5.1 Workload centric dashboard

Given the potentially large amount of data that may be charted, it is recommended to set the chart to “Outlier” mode.

5.1.1 CPU utilization by container

Resource type: Kubernetes Engine Container
Metric: CPU utilization
Filter: User metadata label (e.g. app)
Aligner: mean
Reducer: mean

5.1.2 Memory usage by pod

Resource type: Kubernetes Engine Container
Metric: Memory usage
Filter: User metadata label (e.g. app)
Aligner: mean
Reducer: sum

5.1.3 Container restarts total

Resource type: Kubernetes Container
Metric: kube_pod_container_status_restarts_total
Group By: container
Filter: namespace (e.g. default)
Aligner: rate
Reducer: sum

5.1.4 Kubernetes pod failures

Resource type: Kubernetes Container
Metric: kube_pod_status_phase
Filter: phase=Failed, namespace (e.g. default)
Aligner: mean
Reducer: sum

5.1.5 Kubernetes pods running

Resource type: Kubernetes Container
Metric: kube_pod_container_status_running
Filter: namespace (e.g. default)
Aligner: mean
Reducer: sum

5.1.6 Kubernetes pods pending

Resource type: Kubernetes Container
Metric: kube_pod_status_phase
Filter: phase=Pending, namespace (e.g. default)
Aligner: mean
Reducer: sum

5.1.7 Pod Autoscaler current vs. desired replicas

Metric 1

Resource type: Kubernetes Container
Metric: kube_hpa_status_current_replicas
Filter: namespace (e.g. default)
Group By: hpa
Aligner: mean
Reducer: sum

Metric 2

Resource type: Kubernetes Container
Metric: kube_hpa_status_desired_replicas
Filter: namespace (e.g. default)
Group By: hpa
Aligner: mean
Reducer: sum

5.2 Kubernetes Engine cluster centric dashboard

A Dashboard provides information on the state of Kubernetes resources in your cluster and on any errors that may have occurred.

5.2.1 CPU utilization by node

Resource type: GCE VM Instance
Metric: CPU utilization
Group By: instance_name
Aligner: mean
Reducer: sum

5.2.2 Memory utilization by node

Resource type: Kubernetes Engine Container
Metric: Memory utilization
Group By: instance_id
Aligner: mean
Reducer: sum

5.2.3 Kubernetes pods running vs. capacity by cluster

Metric 1

Resource type: Kubernetes Container
Metric: kube_pod_status_phase
Filter: phase=Running
Group By: cluster_name
Aligner: mean
Reducer: sum

Metric 2

Resource type: Kubernetes Container
Metric: kube_node_status_capacity_pods
Group By: cluster_name
Aligner: mean
Reducer: sum

5.2.4 Log Based Metrics Example — Failed Pod Scheduling

Creating a Log Based Metric
Stackdriver provides a mechanism to define metrics based on the number of log entries generated.

The following example defines a metric that will display the count of log entries stating that a pod is unschedulable due to insufficient resources.

In the Google Cloud Console navigate to “Log-based metrics”

2. Click “Create Metric”.

3. To the right of the “Filter by label or text search” field, click the drop-down arrow and select “Convert to advanced filter”.

Paste the following label filters into the advanced filter text area.

resource.type=”gke_cluster”
jsonPayload.reason=”FailedScheduling”
jsonPayload.involvedObject.kind=”Pod”
jsonPayload.message:(“Insufficient” OR “unavailable”)

The filter area should look like the following:

In the metric editor, configure the metric as follows:

Name: pod_failed_schedule

Labels 1

Name: project_id
Label type: String
Field name: resource.labels.project_id

Labels 2

Name: namespace
Label type: String
Field name: jsonPayload.involvedObject.namespace

Labels 3

Name: pod_name
Label type: String
Field name: jsonPayload.involvedObject.name

Labels 4

Name: reason
Label type: String
Field name: jsonPayload.message
Extraction regular expression: :\W(.*)\.
Units: <leave empty>
Type: Counter

Pod Scheduling Failures (Chart)
Add a chart with the following properties:

Resource type: <leave blank>
Metric: logging/user/pod_failed_schedule
Group By
namespace
reason
Aligner: count
Reducer: sum

—

Authors: Ariel Peretz, Ken Evensen