Hacking OpenShift: Leveraging Prometheus Cluster Metrics in your own Grafana
OpenShift 3.11 comes with the Cluster Prometheus which gathers metrics from many sources, enabling cluster-level metrics like Pod CPU and memory utilization, PVC disk metrics and more! It also ships with a Grafana with some decent dashboards. Both the Prometheus and the Grafana resides in the Namespace openshift-monitoring.
But! There is a big limitation. You cannot edit the resources in the Namespace openshift-monitoring, like the Grafana dashboards. Wait… What? — Yup,
The OpenShift Container Platform Monitoring stack ensures its resources are always in the state it expects them to be. If they are modified, OpenShift Container Platform Monitoring will ensure that this will be reset. — RH OpenShift Docs: Prometheus Cluster Monitoring
So what do you do when you want to use some of these metrics in a Grafana dashboard? You use your own Grafana! If you don’t already have your own custom one, set one up using this guide. Note! You need to have a newer version og Grafana for the datasource authentication to work. The following setup is tested with Grafana 6.2.5.
Authenticating against the Cluster Prometheus from Grafana
In order to authenticate against the Cluster Prometheus in openshift-monitoring, you need to make sure that the Prometheus datasource in Grafana is correctly configured in regards to authentication. In order to authenticate you first need to create a ServiceAccount associated with your Grafana, then create a RoleBinding in openshift-monitoring using that ServiceAccount, then obtain a ServiceAccount token and use it for the datasource authentication and finally configure Grafana.
1. Create the ServiceAccount
Create a ServiceAccount associated with your Grafana, replacing the Namespace:
oc create serviceaccount grafana -n <your-grafana-namespace>
2. Create the RoleBinding
Create a RoleBinding so a ServiceAccount associated with your datasource in Grafana can use the cluster Prometheus, replacing the Namespace:
oc create clusterrolebinding grafana-cluster-monitoring-view \
3. Obtain a ServiceAccount token from step 1
The token is a long-lived token which we will use in the next step.
oc sa get-token grafana -n <your-grafana-namespace>
4. Configure Grafana with a Prometheus datasource with authentication
This is probably the most demanding step, but also the most rewarding when you complete it.
First, create a Secret with the datasource as YAML and save it as openshift-monitoring-grafana-datasource.yaml, replacing highlighted values:
- name: "openshift-monitoring-datasource"
httpHeaderValue1: "Bearer <GRAFANA_SA_TOKEN>"
- namespace: The Namespace for which your Grafana runs in
- url: The address of the Cluster Prometheus Route, can be obtained with
oc get route prometheus-k8s -n openshift-monitoring
- httpHeaderValue1: Should contain “Bearer <token-from-step-3>”
Now you need to mount this Secret into Grafana, replacing Namespace.
$ oc set volume dc/grafana \
Using the newly created openshift-monitoring datasource
Make sure the Grafana Pod is restarted. Grafana will now detect the new datasource and add it to the list of existing datasources, if any.
Congrats! You can now use the datasource named openshift-monitoring-datasource when creating a query.
Example query: monitoring PVC usage in percent
The following query will show the percentage of used bytes on all PersistentVolumeClaims where the usage is above 80%. This is useful if you want to monitor PVC disk usage.
sort_desc((kubelet_volume_stats_used_bytes * 100 / kubelet_volume_stats_capacity_bytes)) > 80