Hacking OpenShift: Leveraging Prometheus Cluster Metrics in your own Grafana

Øyvind Ødegård
Aug 2 · 3 min read

OpenShift 3.11 comes with the Cluster Prometheus which gathers metrics from many sources, enabling cluster-level metrics like Pod CPU and memory utilization, PVC disk metrics and more! It also ships with a Grafana with some decent dashboards. Both the Prometheus and the Grafana resides in the Namespace openshift-monitoring.

Photo by Luke Chesser on Unsplash

But! There is a big limitation. You cannot edit the resources in the Namespace openshift-monitoring, like the Grafana dashboards. Wait… What? — Yup,

The OpenShift Container Platform Monitoring stack ensures its resources are always in the state it expects them to be. If they are modified, OpenShift Container Platform Monitoring will ensure that this will be reset. — RH OpenShift Docs: Prometheus Cluster Monitoring

So what do you do when you want to use some of these metrics in a Grafana dashboard? You use your own Grafana! If you don’t already have your own custom one, set one up using this guide. Note! You need to have a newer version og Grafana for the datasource authentication to work. The following setup is tested with Grafana 6.2.5.


Authenticating against the Cluster Prometheus from Grafana

In order to authenticate against the Cluster Prometheus in openshift-monitoring, you need to make sure that the Prometheus datasource in Grafana is correctly configured in regards to authentication. In order to authenticate you first need to create a ServiceAccount associated with your Grafana, then create a RoleBinding in openshift-monitoring using that ServiceAccount, then obtain a ServiceAccount token and use it for the datasource authentication and finally configure Grafana.

1. Create the ServiceAccount

Create a ServiceAccount associated with your Grafana, replacing the Namespace:

oc create serviceaccount grafana -n <your-grafana-namespace>

2. Create the RoleBinding

Create a RoleBinding so a ServiceAccount associated with your datasource in Grafana can use the cluster Prometheus, replacing the Namespace:

oc create clusterrolebinding grafana-cluster-monitoring-view \
--clusterrole=cluster-monitoring-view \
--serviceaccount=<your-grafana-namespace>:grafana

3. Obtain a ServiceAccount token from step 1

The token is a long-lived token which we will use in the next step.

oc sa get-token grafana -n <your-grafana-namespace>

4. Configure Grafana with a Prometheus datasource with authentication

This is probably the most demanding step, but also the most rewarding when you complete it.

First, create a Secret with the datasource as YAML and save it as openshift-monitoring-grafana-datasource.yaml, replacing highlighted values:

apiVersion: v1
kind: Secret
metadata:
name: openshift-monitoring-grafana-datasource
namespace: <your-grafana-namespace>
stringData:
datasources.yaml: |
apiVersion: 1
datasources:
- name: "openshift-monitoring-datasource"
type: prometheus
access: proxy
url: https://prometheus-k8s-openshift-monitoring.route.to.cluster
basicAuth: false
withCredentials: false
isDefault: false
jsonData:
tlsSkipVerify: true
httpHeaderName1: "Authorization"
secureJsonData:
httpHeaderValue1: "Bearer <GRAFANA_SA_TOKEN>"
editable: true
  • namespace: The Namespace for which your Grafana runs in

Now you need to mount this Secret into Grafana, replacing Namespace.

$ oc set volume dc/grafana \
--add \
--name=openshift-monitoring-grafana-datasource-volume \
--type=secret \
--secret-name=openshift-monitoring-grafana-datasource \
--mount-path=/usr/share/grafana/conf/provisioning/datasources \
--namespace=<your-grafana-namespace>

Using the newly created openshift-monitoring datasource

Make sure the Grafana Pod is restarted. Grafana will now detect the new datasource and add it to the list of existing datasources, if any.

Congrats! You can now use the datasource named openshift-monitoring-datasource when creating a query.

Example query: monitoring PVC usage in percent

The following query will show the percentage of used bytes on all PersistentVolumeClaims where the usage is above 80%. This is useful if you want to monitor PVC disk usage.

sort_desc((kubelet_volume_stats_used_bytes * 100 / kubelet_volume_stats_capacity_bytes)) > 80

Happy monitoring!

Faun

The Must-Read Publication for Aspiring Developers & DevOps Enthusiasts

Øyvind Ødegård

Written by

DevOps Engineer @ Dfind Consulting

Faun

Faun

The Must-Read Publication for Aspiring Developers & DevOps Enthusiasts

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade