2 Mini HOW-TO guides for Prometheus on Openshift- Federation & Custom Infrastructure Alerting

8 min readAug 11, 2021

Let’s start with the problem

Recently as part of my work at Red Hat, a customer reached out to me and asked for my help in editing monitoring related configurations;

The customer wanted to do pretty simple things like generating new alerting rules that are related to the infrastructure, fetch infrastructure-related metrics, rename/relabel some of them, etc.

Openshift 4.x comes with a built-in monitoring stack (Prometheus, Grafana & Alertmanager); The main Prometheus instance that is responsible for scraping infrastructure-related metrics is located in the Openshift-Monitoring namespace and it is managed by the cluster-monitoring-operator.

Here comes the catch - This main Prometheus instance is almost uneditable; Because of security concerns (this Prometheus instance has elevated privileges because it fetches metrics directly from nodes and other infrastructure components [e.g. etcd, etc.]) and because it is a crucial component that is tightly coupled with the Openshift Dashboard and Alerting mechanism — the Openshift developers “locked” it’s reconfigurability by making the cluster-monitoring-operator rest any changes you may want to make.

Furthermore, in Openshift 4.6+ we’ve got a new GA capability called “user-workloads-monitoring” that lets us deploy another Prometheus instance in a dedicated namespace (openshift-user-workload-monitoring) to monitor our workloads; It is configurable by adding one line in a ConfigMap in the original Openshift-Monitoring namespace and the operator does the rest for us, including aggregation from both Prometheus instances to the same Grafana and AlertingManager that the Openshift Platform comes with.

However, this is not a proper solution because this new instance does not monitor the infrastructure, and even tho it's more flexible — we are still facing the same problem.

Experts in Openshift/Kubernetes and Operators may try to challenge me and declared the by marking the Prometheus Operator as “unmanaged”, we can make any changes to the objects managed by the Operator; It will probably work BUT! the Openshift Monitoring operators are linked to the Cluster-Version-Operator, and in order to label them as “Unmanaged” we are required to edit the Cluster-Version-Operator itself — which will put our entire cluster in a problematic, unsupported, non-upgradable state.

Refer to the documentation about that issue:

Configuring the monitoring stack | Monitoring | OpenShift Container Platform 4.6

Build, deploy and manage your applications across cloud- and on-premise infrastructure

docs.openshift.com

So what do you suggest?

I’m very glad you asked.

Almost every solution I’ve found online about Prometheus aggregation & federation is related to Thanos; I completely agree that Thanos is a very powerful tool and it will definitely help us overcome this issue BUT! I found that it requires some learning curve in order to implement it properly, and in small cases, it’s not necessarily required.

I looked for a simple Prometheus federation guide online and once I did not find one simple enough, I’ve decided to write it myself;

HOW-TO Prometheus Federation on Openshift

Based on the following article: https://cloud.redhat.com/blog/federated-prometheus-with-thanos-receive; I’d edited the relevant parts so it will work on newer versions of Openshift; I’d also edited the ServiceMonitor Federation because the old one did not work.

Generate new project and deploy the Prometheus operator in it from the OperatorHub

$ oc new-project test

Install the Prometheus operator in the “test” namespace using the Prometheus operator from the OperatorHub

Apply Serving CA to our Prometheus Instance

Our Prometheus instance needs to connect to the Cluster Managed Prometheus instance in order to gather the cluster-related metrics, this connection uses TLS, so we will use the Serving CA to validate the Targets endpoints (Cluster Managed Prometheus).

The Serving CA is located in the openshift-monitoring namespace, we will create a copy into our namespace so we can use it in our Prometheus instances:

$ oc get configmap serving-certs-ca-bundle \
-o yaml -n openshift-monitoring > serving-certs-ca-bundle.yaml$ oc -n test apply -f serving-certs-ca-bundle.yaml

RBAC

We are going to use theServiceMonitorobject to discover Cluster Managed Prometheus instances and connect to them; We need to grant specific privileges to the ServiceAccount that runs our Prometheus instances.

As you may know, the Cluster Managed Prometheus instances include the oauth proxy to perform authentication and authorization, in order to be able to authenticate we need a ServiceAccount (prometheus-k8s)that can GET all namespaces in the cluster. The token for this ServiceAccount will be used as Bearer Token to authenticate our connections to the Cluster Managed Prometheus instances (It’s located at /var/run/secrets/kubernetes.io/serviceaccount/tokenand will be used in the ServiceMonitor object later on).

$ cat > rbac.yaml << EOF
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
  labels:
    prometheus: federation-prometheus
  name: federation-prometheus-role
rules:
- apiGroups:
  - ""
  resources:
  - namespaces
  - pods
  - services
  - endpoints
  verbs:
  - list
  - get
  - watch
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: federation-prometheus-role
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: federation-prometheus-role
subjects:
- kind: ServiceAccount
  name: prometheus-k8s
  namespace: test
EOF$ oc apply -f rbac.yaml -n test

Deploy the Prometheus object + ServiceMonitor object for federation scraping

To deploy the Prometheus instance, we need to create an Prometheus object. On top of that, a ServiceMonitorwill be created. The ServiceMonitor have the required configuration for scraping the /federate endpoint from the Cluster Managed Prometheus instances.

Note! I’ve edited the paramssection so that the matchexpression will work; In the original article it's not working as expected.

I’ve also edited the Prometheus object to include only the necessary content, and avoid using Thanos as described in the original article.

We will use openshift-oauth-proxy to protect our Prometheus instances so unauthenticated users cannot see our metrics, just like the main Cluster Managed Prometheus instance is behaving.

As we want to protect our Prometheus instances using oauth-proxy we need to generate a session secret as well as annotate the ServiceAccount that will run the pods indicating which OpenShift Route will redirect to the oauth proxy.

$ oc create secret generic prometheus-k8s-proxy \\
--from-literal=session_secret=$(head /dev/urandom | tr -dc A-Za-z0-9 | head -c43) \\
-n test$ oc annotate serviceaccount prometheus-k8s \\
serviceaccounts.openshift.io/oauth-redirectreference.prometheus-k8s='{"kind":"OAuthRedirectReference","apiVersion":"v1","reference":{"kind":"Route","name":"federation-prometheus"}}' \\ 
-n test

Deploy the Prometheus object + ServiceMonitor (Federation) object

cat > prometheus.yaml << EOF
apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: federation-prometheus
  labels:
    prometheus: federation-prometheus
  namespace: test
spec:
  replicas: 2
  version: v2.8.0
  serviceAccountName: prometheus-k8s
  serviceMonitorSelector:
    matchLabels:
      app: federation-monitor
  configMaps:
  - serving-certs-ca-bundle
  containers:
  - args:
    - -provider=openshift
    - -https-address=:9091
    - -http-address=
    - -email-domain=*
    - -upstream=http://localhost:9090
    - -openshift-service-account=prometheus-k8s
    - '-openshift-sar={"resource": "namespaces", "verb": "get"}'
    - -tls-cert=/etc/tls/private/tls.crt
    - -tls-key=/etc/tls/private/tls.key
    - -cookie-secret-file=/etc/proxy/secrets/session_secret
    - -skip-auth-regex=^/metrics
    image: quay.io/openshift/origin-oauth-proxy:latest
    name: oauth-proxy
    ports:
    - containerPort: 9091
      name: web-proxy
    volumeMounts:
    - mountPath: /etc/tls/private
      name: secret-prometheus-k8s-tls
    - mountPath: /etc/proxy/secrets
      name: secret-prometheus-k8s-proxy
  secrets:
    - prometheus-k8s-tls
    - prometheus-k8s-proxy
---
apiVersion: v1
kind: Service
metadata:
  annotations:
    service.alpha.openshift.io/serving-cert-secret-name: prometheus-k8s-tls
  labels:
    prometheus: federation-prometheus
  name: prometheus-k8s
spec:
  ports:
  - name: web-proxy
    port: 9091
    protocol: TCP
    targetPort: web-proxy
  selector:
    app: prometheus
    prometheus: federation-prometheus
  type: ClusterIP
EOFcat > serviceMonitor_Federation.yaml << EOF
---
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    app: federation-monitor
  name: federation-prometheus
  namespace: test
spec:
  endpoints:
  - interval: 30s
    scrapeTimeout: 30s
    port: web
    path: /federate
    honorLabels: true
    params:
      'match[]': 
        - '{job!=""}'
    scheme: https
    bearerTokenFile: /var/run/secrets/kubernetes.io/serviceaccount/token
    tlsConfig:
      caFile: /etc/prometheus/configmaps/serving-certs-ca-bundle/service-ca.crt
      serverName: prometheus-k8s.openshift-monitoring.svc.cluster.local
  namespaceSelector:
    matchNames:
    - openshift-monitoring
  selector:
    matchLabels:
      prometheus: "k8s"
EOF$ oc apply -f prometheus.yaml -n test
$ oc apply -f serviceMonitor_Federation.yaml -n test

Replaceable / Edible lines in the prometheus.yaml:

spec → version (version: v2.8.0)
spec → replicas (replicas: 2)
oauth proxy container image (image: quay.io/openshift/origin-oauth-proxy:latest)

Testing

Access the /federate endpoint of the main Cluster Managed Prometheus instance. Don’t forget to include the “match” expression at the end.

2. Verify that our new Prometheus instance’s pods are running properly


$ oc get pods -n test

3. Login to the new Prometheus instance using Openshift Authentication (we’ve deployed the pod with openshift-oauth-proxy included), and go to the /targets endpoint;

As you can see the first “Up” endpoint is the main Openshift Prometheus instance that is located in the “openshift-monitoring" namespace, and it is being scraped by the new Prometheus instance we've deployed. The other targets should be ignored as they were for testing purposes only.

Bonus HOW-TO! Prometheus Custom Infrastructure Alerts

Custom alerts related to our infrastructure are necessary to properly notify the cluster admins when things go south;

But unfortunately, in case we will try editing an existing PrometheusRule — the Operator will revert it to its original form and our changes won’t take place.

However, the solution is simple; All you need to do is to copy an existing PrometheusRule, edit it as you want — but keep it attached to the main Openshift Prometheus instance, and apply it in the relevant namespace.

Note that the Prometheus Operator in the openshift-monitoring namespace won’t delete it because it is an “independent” PrometheusRule object — unrelated to the managed objects that the operator is responsible for.

Run:

$ oc get -o yaml prometheusrule/dns -n openshift-dns-operator > myrule.yaml

Edit:

Locate the relevant alert rule that you want to change

Edit it (the original was 5m, I changed it to 300 for the example)

Delete all the other irrelevant rules, and change the name of the object so it won’t replace the original

apply it (namespace already included in the object itself)

oc apply -f myrule.yaml

In the Openshift Dashboard navigate (in the “administrator” view) to Monitoring → Alerting → Search for the relevant Alert

Notice that now we’ve got the original one and our new one

FYI — You can also rename the alert entirely.