Using the Operator Lifecycle Manager to deploy Prometheus on OpenShift
Have you ever needed to install Prometheus or other piece of software on Kubernetes or OpenShift? In general is not a very difficult thing, right? You just process a template, run an image… (At least to a certain extend). The real problems usually surface after what we call day-1… Day-2 operations are not obvious on day-1 but you can imagine some: upgrades, changes in size or even morphology of the deployment, etc.
In this article, I explain step by step how to deploy the Prometheus Operator and how to monitor applications which are in a different namespace.
I provide with a step by step guide to deploy Prometheus on OpenShift using the Operator Lifecycle Manager (currently in Tech Preview in OpenShift).
I stress on how to check if your configuration is ok and also how to set the operator to monitor application across namespaces.
What is the Prometheus Operator?
Operators were introduced by CoreOS as a class of software that operates other software, putting operational knowledge collected by humans into software. For further information around the Operator Framework please go here.
The Prometheus Operator serves to make running Prometheus on top of Kubernetes as easy as possible, while preserving Kubernetes-native configuration options.
The Operator Lifecycle Manager
The Operator Framework (currently in Technology Preview phase) installs the Operator Lifecycle Manager (OLM), which aids cluster administrators in installing, upgrading, and granting access to Operators running on their OpenShift Container Platform cluster.
The OpenShift Container Platform web console has been is also updated for cluster administrators to install Operators, as well as grant specific projects access to use the catalog of Operators available on the cluster.
One of the Red Hat Supported Operators is the one we need for this lab, this means you don’t need to install the operator itself but to use it. As any other operator there are a set of objects (CRDs) we need to create to tell the operator how we want to install and operate Prometheus.
These are the objects we’ll need to create:
- AlertManager (we want use it in this lab)
The next image shows how they’re related. For further details please go here
In order to follow this lab you’ll need:
- Have an OpenShift cluster to play with
- A user who has been granted the
The next command lines show how to grant a user the
cluster-admin role and create a project where we’ll install Prometheus.
oc adm policy add-cluster-role-to-user cluster-admin <user_name>
oc new-project monitoring
End result of the lab
The aim of this lab is to deploy the following architecture.
- As you can see we need to define a Prometheus server ‘linked’ to a set of ServiceMonitors through a
serviceMonitorSelectorrule, in this case we’re interested on ServiceMonitors containing label
k8s-appno matter which value it contains.
- Additionally we’ll define a ServiceMonitor containg the required label
k8s-appwhich in its turn we’ll trigger the scanning of Services according to the rule defined in the
selectorsection (matches label teamwith value backend).
- Finally port property in section
endpointsof our ServiceMonitor should match the port name defined in our target Service objects.
Create a Prometheus subscription
Please go to the OpenShift Web console
Then go to the Cluster Console and open the
Operators➡Catalog Resources menu on the left. There we’ll create a
Subscription which the way we manage the Prometheus Operator itself, not the servers.
Make sure that the
monitoring project we created before is selected before proceeding!
Now it’s time to create the Prometheus Operator subscription. Please, scroll down and click on the
Create Subscription button close to the Prometheus Operator.
Now you should be presented with a default/example subscription descriptor, pay attention to the namespace, it should be
monitoring. Once checked the namespace please click on
If everything goes as expected you should see something similar to this. You should see the upgrade status as
Up to date, if that is the case click on the link pointed by the arrow which should take you to the
Cluster Service Versions area (menu on the left).
You should be able to see the description and links to documentation of the Prometheus Operator along with a set of
Create New commands, signaled with a red arrow.
Good job, you have deployed the operator in project
monitoring, in fact if you go to the OpenShift Application Console to project
monitor you should see one instance of the prometeus-operator as in the next picture.
Now let’s proceed with the deployment of the Prometheus server.
Deployment of the Prometheus Server
Go back to the
Cluster Console and click on the
Create New button and choose
Next screen shows an example descriptor of a Prometheus server, go ahead and change
metadata-➡name to server as in the image and click
Pay attention to section
spec➡serviceMonitorSelector. There is where we define the match expression to select which Service Monitors we’re interested in. In this case we want Service Monitors with a label called
key. Also pay attention to
spec➡replicas, if you go to the OpenShift Application Console you’ll find a StatefulSet called prometheus-server with exactly 2 replicas
- key: k8s-app
- namespace: monitoring
Deploying a test application with monitoring enabled
We’ve borrowed the following example from the Getting Started Guide of the Prometheus Operator
Please follow the next steps to deploy a test application (3 pods) that exposes Prometheus metrics along with a Service that balances requests to the pods.
Let’s create a project for our application.
oc new-project monitored-apps
Let’s deploy the test application.
$ cat << EOF | oc create -n "monitored-apps" -f -
- name: example-app
- name: web
Let’s check the status of those 3 pods.
$ oc get pod -n "monitored-apps"
NAME READY STATUS RESTARTS AGE
example-app-94c8bc8-jq5cr 1/1 Running 0 30s
example-app-94c8bc8-phfrv 1/1 Running 0 30s
example-app-94c8bc8-vfgr7 1/1 Running 0 30s
Now let’s create a Service object to balance to these pods.
Pay attention to
spec➡port➡name as we explained before should match the value of
metadata➡endpoints➡port in the ServiceMonitor
$ cat << EOF | oc create -n "monitored-apps" -f -
- name: web
Let’s create a ServiceMonitor to scan our test Service
Please go to the Cluster Console to the
Operators➡Cluster Service Versions area. And click on
Create Newand select Service Monitor.
Remember that project should be
The next descriptor will deploy a ServiceMonitor which is compliant with the rule we defined in our Prometheus object, namely: having a label named
Attention: we are creating the ServiceMonitor in the same namespace of the Prometheus object.
- interval: 30s
TIP: namespaceSelector could also define exactly which namespaces you want to discovered targets from
So far we have created a Prometheus server and a ServiceMonitor that points to a Service. Now we should check if everything is fine or not.
We can do this by checking Prometheus server logs, but before we do that we need to locate one of the pods, the next command will help us here.
$ oc get pods -n monitoring
NAME READY STATUS RESTARTS AGE
prometheus-operator-7fccbd7c74-48m6v 1/1 Running 0 16h
prometheus-server-0 3/3 Running 1 3h
prometheus-server-1 3/3 Running 1 3h
Now that we know the name of the pods we’re looking for we can read the logs. Next command gets us the logs of container
prometheus in one of the target pods.
$ oc logs prometheus-server-0 -c prometheus -n monitoring
level=error ts=2019-02-12T10:57:12.739199828Z caller=main.go:218 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:289: Failed to list *v1.Pod: pods is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list pods in the namespace \"monitored-apps\": no RBAC policy matched"
level=error ts=2019-02-12T10:57:12.739190937Z caller=main.go:218 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:288: Failed to list *v1.Service: services is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list services in the namespace \"monitored-apps\": no RBAC policy matched"
level=error ts=2019-02-12T10:57:12.73929972Z caller=main.go:218 component=k8s_client_runtime err="github.com/prometheus/prometheus/discovery/kubernetes/kubernetes.go:287: Failed to list *v1.Endpoints: endpoints is forbidden: User \"system:serviceaccount:monitoring:prometheus-k8s\" cannot list endpoints in the namespace \"monitored-apps\": no RBAC policy matched"
Well… something is not ok… apparently the problem has to do with permissions over namespace
system:serviceaccount:monitoring:prometheus-k8s cannot list endpoints in the namespace “monitored-apps”
So what we have to do is grant those required permissions (
view) to the Service Account created by the operator and used by Prometheus.
We could grant a cluster-role to the service account, this way it can monitor any namespace, as in the next command.
oc adm policy add-cluster-role-to-user view system:serviceaccount:monitoring:prometheus-k8s
Or we can add permissions in a namespace basis as in the next one.
oc adm policy add-role-to-user view system:serviceaccount:monitoring:prometheus-k8s -n monitored-apps
Once you run one of the two versions, errors should stop appearing.
Further checking… would involve using the Prometheus console… in order to do so we need first expose the Service as in the next command.
$ oc get svc -n monitoring
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
prometheus-operated ClusterIP None <none> 9090/TCP 13h
$ oc expose svc/prometheus-operated -n monitoring
Now please open the url returned by the next command and navigate to
$ oc get route -n monitoring
NAME HOST/PORT PATH SERVICES PORT TERMINATION WILDCARD
prometheus-operated prometheus-operated-monitoring.apps.serverless-8d48.openshiftworkshop.com prometheus-operated web None
You should see something like this. There are three targets, one per pod.
Now if you navigate to
Status➡Configuration you should be able to see that there’s a scrape_config entry per ServiceMonitor object, in our case we only have one, called
backnd-monitor, the generated scrape-config name
See it in action
Now that we’re sure that our target service is being monitored we could go and see some graphs. To do so, navigate to
Graph and start typing
codelab in the
Expression... textfield. Then choose one of the available metrics, for instance
…and click on tab
Congratulations, you’ve deployed Prometheus using the Operator Lifecycle Manager, deployed a service in a different namespace, tracked down a configuration error, fixed it and finally checked everything works… hopefully ;-)
This simple guide doesn’t go deep into the configuration of Prometheus, but shows how easy it is to set up Prometheus using the Operator Lifecycle Manager… but remember the real payoff starts on day-2 operations, so what’s easy today should be easy tomorrow too.