The “new” Stackdriver Monitoring for Kubernetes has been in public beta since around May of 2018 when this announcement was first published. I’ve spent quite a bit of time with it since then, and one of the areas I really wanted to understand better was its integration with Prometheus. I had also never really used Prometheus before, so I needed to acquire some basic knowledge of that tool, as well — especially as I keep hearing about it from customers who often want to understand either how to keep using it as they move to GKE or how to replace and/or supplement it with Stackdriver. The following is my attempt to demonstrate “Prometheus on GKE with Stackdriver” — starting at basically zero. As with Profiler, I also wanted to see if I could understand how to instrument code in multiple languages. Let’s see what happens?
The first thing I needed to learn was — the first thing. What is Prometheus, how does it work, and, of primary importance to me in this exercise — how do metrics work? Luckily, someone else did a fantastic job of documenting what I needed. You can find their series here, but the key things that I needed to understand and learned from their posts are:
- There are 4 main types of metrics — counter (that increment), gauge (that go up and down), summary, and histogram
- In order for Prometheus to receive metrics from the application, the application needs to expose a dedicated endpoint (often /metrics) with the metric values available there
- If an application writes metrics somewhere already (like a database or a file), you can create a Prometheus exporter to read those and then expose them
- You need to have a Prometheus server where the metrics are stored
I am fairly certain that there’s a lot I am missing. For example, I still don’t quite understand the concept of the registry, but this information was enough for me to make things work. Now that we have basics in hand, let’s actually get to it.
Install Prometheus on GKE and validate
I set out to demonstrate how to export Prometheus metrics in a simple application and have them be picked up by Stackdriver to test the Prometheus integration. Thankfully, the Stackdriver team has done a great job of documenting the steps required to make this happen. Let’s walk through it, anyway.
Create GKE cluster
First, we need to create a cluster. We’ll want to make sure we opt into the “new” Stackdriver monitoring and logging configuration.
gcloud beta container --project "<your project ID" \clusters create "prometheus-demo-cluster" \--zone "<desired zone>" \--cluster-version "latest" \--enable-stackdriver-kubernetes
Next, we need to install Prometheus in our cluster. As per the compatibility matrix, I wanted to use the latest version possible, which, at the time of writing, is 2.6.1. I had never really installed it before, but a minute of searching turned up this guide, which I followed. I am not sure how valuable it is to reproduce the steps here, but I want to capture my experience in case I ever need to do this again.
First, I need to set myself as an admin in the cluster:
ACCOUNT=$(gcloud info --format='value(config.account)')kubectl create clusterrolebinding owner-cluster-admin-binding \--clusterrole cluster-admin \--user $ACCOUNT
Then, I created a dedicated namespace for Prometheus components:
kubectl create namespace prometheus
Then, I gave this namespace cluster reader permissions by creating a new role. I copied this file and simply changed the name of the namespace at the very end to “prometheus” instead of “monitoring”. I saved the file and created the role:
$ kubectl create -f clusterRole.yamlclusterrole.rbac.authorization.k8s.io/prometheus createdclusterrolebinding.rbac.authorization.k8s.io/prometheus created
The next step is to create a configMap for the scrape and alerting rules. I copied this file, again replacing references to the “monitoring” namespace with “prometheus” and applied it:
$ kubectl create -f configMap.yaml -n prometheusconfigmap/prometheus-server-conf created
Next, as per the guide, I created a .yaml file for the Prometheus deployment, again replacing the namespace reference and (importantly!) updating the Prometheus version referenced to 2.6.1, and created the deployment:
$ kubectl create -f prometheus-deployment.yaml -n prometheusdeployment.extensions/prometheus-deployment created
Let’s check to make sure things are up and running:
$ kubectl get pods -n prometheusNAME READY STATUS RESTARTS AGEprometheus-deployment-7ddb99dcb-fkz4d 1/1 Running 0 1m
And finally, let’s check to make sure it’s working by forwarding the pod port to localhost:
$ kubectl port-forward prometheus-deployment-7ddb99dcb-fkz4d 8080:9090 -n prometheusForwarding from 127.0.0.1:8080 -> 9090Forwarding from [::1]:8080 -> 9090
Now, when I go to http://localhost:8080, I can see this:
Install Stackdriver Collector
Now that we have Prometheus running in the cluster, we need to install the Stackdriver Collector to get those metrics exported to the Stackdriver backend. This is clearly documented; first, I made a local copy of this file. Then, I set the required environment variables:
export KUBE_NAMESPACE=prometheusexport KUBE_CLUSTER=prometheus-demo-clusterexport GCP_REGION=<my region>export GCP_PROJECT=<my project ID>export DATA_DIR=/prometheus/export DATA_VOLUME=prometheus-storage-volumeexport SIDECAR_IMAGE_TAG=release-0.3.2
I ran the patch script:
$ sh ./patch.sh deployment prometheus-deploymentdeployment.extensions/prometheus-deployment patched
Now, the Stackdriver collector is running as a sidecar container in the Prometheus pod:
$ kubectl get pods -n prometheusNAME READY STATUS RESTARTS AGEprometheus-deployment-744758f7cc-v6zqx 2/2 Running 2 1m
Finally, we’re ready to see if we have Prometheus metrics showing up in Stackdriver.
Note that, while support for this integration is in beta, there’s a limit of 1000 Prometheus metrics per project. You should be able to find a “quota exceeded” message in Logging if you do run into that.
I went to Metrics Explorer, filtered to the cluster I’m working with, and sure enough — they’re there!
Now, I was ready to do some instrumentation in my code.
Exporting Prometheus metrics in an app
I should note that, while I am starting from zero, most people will likely come to this having either already been running Prometheus (and in that case not needing any information on instrumentation) or looking for instrumentation options. In the latter case, we would probably steer them toward OpenCensus, rather than starting with Prometheus instrumentation, and I hope to get a chance to do a similar exercise with OpenCensus soon.
My intent here is pretty straightforward. I simply want to create a “Hello World” app that creates a random number every time the page is hit and have that number show up as a Prometheus metric in Stackdriver. I had to do a bit of research on how this is done, but I found two resources that were very helpful. The first is the Prometheus Python client itself. The second was a post by someone else explaining how to use the client in Flask. Armed with this information, I forged ahead.
First, I created a basic Flask app. I imported the Prometheus client and created a basic gauge metric that I set to a random value every time the home page is accessed. I then created a second Flask route for /metrics that exposed the metrics. You can see the full app.py file here.
Next, I needed to create a container image from this app that can run on GKE. I saved my Python package state using `pip freeze` and created the most basic Dockerfile file I could:
I built the image using Google Cloud Build:
gcloud builds submit — tag gcr.io/[project ID]/prometheus-demo-python .
which worked on the first try!
Before deploying it on GKE, I wanted to test it locally just to make sure everything was working. I ran the image using `docker run`:
$ docker run -p 8080:8080 gcr.io/[project ID]/prometheus-demo-python:latest
And sure enough, I was able to see Prometheus metrics on /metrics:
Now, I needed to define the deployment I was going to use to run it on GKE. I created a .yaml file for that referring to my new image. The key thing to note here is this section that tells Prometheus where to scrape for metrics:
I created the deployment:
$ kubectl apply -f ./prometheus-demo.yamldeployment.extensions/prometheus-demo-python created
And exposed it using a load balancer:
$ kubectl expose deployment prometheus-demo-python --type=LoadBalancer --port 8080service/prometheus-demo-python exposed
Now, I can see the same /metrics page on the external IP I got by running `kubectl get services`. Finally, it’s time to see if the random value metric I created is showing up in Stackdriver by using Metrics Explorer:
And there it is!
This is very cool! Now, let’s see if I can do the same in Go?
As before, I needed to do a bit of looking around to understand how to instrument my (very basic) code with Prometheus. I created a basic app and found the Golang Prometheus client and a guide to instrumenting my application — both from Prometheus directly. This was enough for me to move forward. You can see the application code here. The thing of note here is that there are actually two separate handlers for the / route and the /metrics route. The first is handled by http.ResponseWriter, whereas /metrics is handled by promhttp. This was interesting to me, at least!
Now that my code is written, I needed to build a container image from it so that I could deploy it to GKE. Here’s the Dockerfile I used to build the image using Cloud Build:
It took me a while to figure out that I needed to include the build step in the Dockerfile in order for the code to actually run. This post was very helpful, though I never did figure out how to make my image that small. Nevertheless, it built and ran!
I once again tested the image locally using `docker run` to make sure I was seeing /metrics and specifically the go_random_value metric I created:
Now, I was ready to deploy to GKE. I copied the deployment .yaml file I used for my Python deployment, taking care to modify the Prometheus scrape settings to refer to the port on which this service is running (8081 rather than 8080), replaced all references to python with go, and pointed it at the new go image. I then exposed the deployment using a load balancer on a different port (again 8081) and check to see if I can hit the /metrics endpoint on that external IP. Finally, I was ready to see if the go_random_value metric was in Stackdriver — and it was!
The last language I wanted to attempt was Node.js. After some searching, I found that there is not an “official” client from Prometheus, but this one seems to be the standard. Again, I built a simple Hello World app using Express that sets a random value and made sure that it worked locally.
One thing of note here is that my “user” metric is the first one in the list, which certainly makes verification easier!
I then created a Dockerfile, built the image using Cloud Build, tested it locally using docker run, and deployed it to my cluster much in the same way as before. And sure enough — the node_random_value metric showed in Stackdriver just like the others!
I really enjoyed this exercise, as it allowed me the opportunity to learn the (very) basics of Prometheus, forced me to recall (learn?) how to create Dockerfiles for different languages, and taught me how to get these Prometheus metrics to show up in Stackdriver. Later, I can see getting into managing time series labels, the other data types (like histograms?), or how to actually operate Prometheus “in production”. But we’ll leave here for now — thanks for sticking with it till the end! If you’re interested in the code or configuration files, I’ve made them available in this repo.