2 Mini HOW-TO guides for Prometheus on Openshift- Federation & Custom Infrastructure Alerting

Tommer Amber
8 min readAug 11, 2021

--

Let’s start with the problem

Recently as part of my work at Red Hat, a customer reached out to me and asked for my help in editing monitoring related configurations;

The customer wanted to do pretty simple things like generating new alerting rules that are related to the infrastructure, fetch infrastructure-related metrics, rename/relabel some of them, etc.

Openshift 4.x comes with a built-in monitoring stack (Prometheus, Grafana & Alertmanager); The main Prometheus instance that is responsible for scraping infrastructure-related metrics is located in the Openshift-Monitoring namespace and it is managed by the cluster-monitoring-operator.

Here comes the catch - This main Prometheus instance is almost uneditable; Because of security concerns (this Prometheus instance has elevated privileges because it fetches metrics directly from nodes and other infrastructure components [e.g. etcd, etc.]) and because it is a crucial component that is tightly coupled with the Openshift Dashboard and Alerting mechanism — the Openshift developers “locked” it’s reconfigurability by making the cluster-monitoring-operator rest any changes you may want to make.

Furthermore, in Openshift 4.6+ we’ve got a new GA capability called “user-workloads-monitoring” that lets us deploy another Prometheus instance in a dedicated namespace (openshift-user-workload-monitoring) to monitor our workloads; It is configurable by adding one line in a ConfigMap in the original Openshift-Monitoring namespace and the operator does the rest for us, including aggregation from both Prometheus instances to the same Grafana and AlertingManager that the Openshift Platform comes with.

However, this is not a proper solution because this new instance does not monitor the infrastructure, and even tho it's more flexible — we are still facing the same problem.

Experts in Openshift/Kubernetes and Operators may try to challenge me and declared the by marking the Prometheus Operator as “unmanaged”, we can make any changes to the objects managed by the Operator; It will probably work BUT! the Openshift Monitoring operators are linked to the Cluster-Version-Operator, and in order to label them as “Unmanaged” we are required to edit the Cluster-Version-Operator itself — which will put our entire cluster in a problematic, unsupported, non-upgradable state.

Refer to the documentation about that issue:

So what do you suggest?

I’m very glad you asked.

Almost every solution I’ve found online about Prometheus aggregation & federation is related to Thanos; I completely agree that Thanos is a very powerful tool and it will definitely help us overcome this issue BUT! I found that it requires some learning curve in order to implement it properly, and in small cases, it’s not necessarily required.

I looked for a simple Prometheus federation guide online and once I did not find one simple enough, I’ve decided to write it myself;

HOW-TO Prometheus Federation on Openshift

Based on the following article: https://cloud.redhat.com/blog/federated-prometheus-with-thanos-receive; I’d edited the relevant parts so it will work on newer versions of Openshift; I’d also edited the ServiceMonitor Federation because the old one did not work.

Generate new project and deploy the Prometheus operator in it from the OperatorHub

$ oc new-project test
  • Install the Prometheus operator in the “test” namespace using the Prometheus operator from the OperatorHub

Apply Serving CA to our Prometheus Instance

Our Prometheus instance needs to connect to the Cluster Managed Prometheus instance in order to gather the cluster-related metrics, this connection uses TLS, so we will use the Serving CA to validate the Targets endpoints (Cluster Managed Prometheus).

The Serving CA is located in the openshift-monitoring namespace, we will create a copy into our namespace so we can use it in our Prometheus instances:

$ oc get configmap serving-certs-ca-bundle \
-o yaml -n openshift-monitoring > serving-certs-ca-bundle.yaml
$ oc -n test apply -f serving-certs-ca-bundle.yaml

RBAC

We are going to use theServiceMonitorobject to discover Cluster Managed Prometheus instances and connect to them; We need to grant specific privileges to the ServiceAccount that runs our Prometheus instances.

As you may know, the Cluster Managed Prometheus instances include the oauth proxy to perform authentication and authorization, in order to be able to authenticate we need a ServiceAccount (prometheus-k8s)that can GET all namespaces in the cluster. The token for this ServiceAccount will be used as Bearer Token to authenticate our connections to the Cluster Managed Prometheus instances (It’s located at /var/run/secrets/kubernetes.io/serviceaccount/tokenand will be used in the ServiceMonitor object later on).

$ cat > rbac.yaml << EOF
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
labels:
prometheus: federation-prometheus
name: federation-prometheus-role
rules:
- apiGroups:
- ""
resources:
- namespaces
- pods
- services
- endpoints
verbs:
- list
- get
- watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: federation-prometheus-role
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: federation-prometheus-role
subjects:
- kind: ServiceAccount
name: prometheus-k8s
namespace: test
EOF
$ oc apply -f rbac.yaml -n test

Deploy the Prometheus object + ServiceMonitor object for federation scraping

To deploy the Prometheus instance, we need to create an Prometheus object. On top of that, a ServiceMonitorwill be created. The ServiceMonitor have the required configuration for scraping the /federate endpoint from the Cluster Managed Prometheus instances.

Note! I’ve edited the paramssection so that the matchexpression will work; In the original article it's not working as expected.

I’ve also edited the Prometheus object to include only the necessary content, and avoid using Thanos as described in the original article.

We will use openshift-oauth-proxy to protect our Prometheus instances so unauthenticated users cannot see our metrics, just like the main Cluster Managed Prometheus instance is behaving.

As we want to protect our Prometheus instances using oauth-proxy we need to generate a session secret as well as annotate the ServiceAccount that will run the pods indicating which OpenShift Route will redirect to the oauth proxy.

$ oc create secret generic prometheus-k8s-proxy \\
--from-literal=session_secret=$(head /dev/urandom | tr -dc A-Za-z0-9 | head -c43) \\
-n test
$ oc annotate serviceaccount prometheus-k8s \\
serviceaccounts.openshift.io/oauth-redirectreference.prometheus-k8s='{"kind":"OAuthRedirectReference","apiVersion":"v1","reference":{"kind":"Route","name":"federation-prometheus"}}' \\
-n test

Deploy the Prometheus object + ServiceMonitor (Federation) object

cat > prometheus.yaml << EOF
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
name: federation-prometheus
labels:
prometheus: federation-prometheus
namespace: test
spec:
replicas: 2
version: v2.8.0
serviceAccountName: prometheus-k8s
serviceMonitorSelector:
matchLabels:
app: federation-monitor
configMaps:
- serving-certs-ca-bundle
containers:
- args:
- -provider=openshift
- -https-address=:9091
- -http-address=
- -email-domain=*
- -upstream=http://localhost:9090
- -openshift-service-account=prometheus-k8s
- '-openshift-sar={"resource": "namespaces", "verb": "get"}'
- -tls-cert=/etc/tls/private/tls.crt
- -tls-key=/etc/tls/private/tls.key
- -cookie-secret-file=/etc/proxy/secrets/session_secret
- -skip-auth-regex=^/metrics
image: quay.io/openshift/origin-oauth-proxy:latest
name: oauth-proxy
ports:
- containerPort: 9091
name: web-proxy
volumeMounts:
- mountPath: /etc/tls/private
name: secret-prometheus-k8s-tls
- mountPath: /etc/proxy/secrets
name: secret-prometheus-k8s-proxy
secrets:
- prometheus-k8s-tls
- prometheus-k8s-proxy
---
apiVersion: v1
kind: Service
metadata:
annotations:
service.alpha.openshift.io/serving-cert-secret-name: prometheus-k8s-tls
labels:
prometheus: federation-prometheus
name: prometheus-k8s
spec:
ports:
- name: web-proxy
port: 9091
protocol: TCP
targetPort: web-proxy
selector:
app: prometheus
prometheus: federation-prometheus
type: ClusterIP
EOF
cat > serviceMonitor_Federation.yaml << EOF
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
labels:
app: federation-monitor
name: federation-prometheus
namespace: test
spec:
endpoints:
- interval: 30s
scrapeTimeout: 30s
port: web
path: /federate
honorLabels: true
params:
'match[]':
- '{job!=""}'

scheme: https
bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
tlsConfig:
caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
serverName: prometheus-k8s.openshift-monitoring.svc.cluster.local
namespaceSelector:
matchNames:
- openshift-monitoring
selector:
matchLabels:
prometheus: "k8s"
EOF
$ oc apply -f prometheus.yaml -n test
$ oc apply -f serviceMonitor_Federation.yaml -n test

Replaceable / Edible lines in the prometheus.yaml:

  • spec → version (version: v2.8.0)
  • spec → replicas (replicas: 2)
  • oauth proxy container image (image: quay.io/openshift/origin-oauth-proxy:latest)

Testing

  1. Access the /federate endpoint of the main Cluster Managed Prometheus instance. Don’t forget to include the “match” expression at the end.

2. Verify that our new Prometheus instance’s pods are running properly


$ oc get pods -n test

3. Login to the new Prometheus instance using Openshift Authentication (we’ve deployed the pod with openshift-oauth-proxy included), and go to the /targets endpoint;

As you can see the first “Up” endpoint is the main Openshift Prometheus instance that is located in the “openshift-monitoring" namespace, and it is being scraped by the new Prometheus instance we've deployed. The other targets should be ignored as they were for testing purposes only.

Bonus HOW-TO! Prometheus Custom Infrastructure Alerts

Custom alerts related to our infrastructure are necessary to properly notify the cluster admins when things go south;

But unfortunately, in case we will try editing an existing PrometheusRule — the Operator will revert it to its original form and our changes won’t take place.

However, the solution is simple; All you need to do is to copy an existing PrometheusRule, edit it as you want — but keep it attached to the main Openshift Prometheus instance, and apply it in the relevant namespace.

Note that the Prometheus Operator in the openshift-monitoring namespace won’t delete it because it is an “independent” PrometheusRule object — unrelated to the managed objects that the operator is responsible for.

All the built-in alerting rules
I chose one of them as an example

Run:

$ oc get -o yaml prometheusrule/dns -n openshift-dns-operator > myrule.yaml

Edit:

  • Locate the relevant alert rule that you want to change
  • Edit it (the original was 5m, I changed it to 300 for the example)
  • Delete all the other irrelevant rules, and change the name of the object so it won’t replace the original
  • apply it (namespace already included in the object itself)
oc apply -f myrule.yaml
  • In the Openshift Dashboard navigate (in the “administrator” view) to Monitoring → Alerting → Search for the relevant Alert
Notice that now we’ve got the original one and our new one

FYI — You can also rename the alert entirely.

--

--