Simple Management of Prometheus Monitoring Pipeline with the Prometheus Operator
Prometheus is an open source monitoring and alerting toolkit originally developed by SoundCloud in 2012. Since then, the platform has attracted a vibrant developer and user community. Prometheus is now closely integrated into cloud-native ecosystem and has native support for containers and Kubernetes.
In the earlier tutorial, you learned how to configure and deploy Prometheus to monitor your Kubernetes applications. However, configuring Prometheus is not a trivial task because you need to have a domain-specific knowledge including Prometheus configuration format and Kubernetes auto-discover settings. Obviously, acquiring this knowledge takes time and effort.
However, as we show in this tutorial, you can dramatically simplify the deployment and management of your Prometheus instances with the Prometheus Operator developed by CoreOS. We discuss how the Prometheus Operator could benefit your monitoring pipeline, and then we walk you through setting up a working Prometheus Operator to collect Prometheus-format metrics from your applications. Let’s get started!
What are Operators?
The concept of software operators was introduced by CoreOS back in 2016. In a nutshell, an operator is any application-specific or domain-specific controller that extends the Kubernetes API to simplify deployment, configuration, and management of complex stateful applications on behalf of Kubernetes users.
Under the hood, operators abstract basic Kubernetes APIs and controllers and automate common tasks for specific applications (e.g., Prometheus). Thanks to this abstraction, users can easily configure complex applications even with little knowledge of their domain-specific configuration and language. In addition, operators can be useful for a broad array of other tasks including safe coordination of app upgrades, service discovery, TLS certificate configuration, disaster recovery, backup management, etc.
Building upon the definition above, the Prometheus Operator may be defined as a piece of software on top of Kubernetes that enables simpler management of Prometheus instances, including their configuration and service discovery. It allows the user to easily launch multiple instances of Prometheus, to configure Prometheus versions, as well to manage retention policies, persistence, and replicas.
In addition, the Prometheus Operator can automatically generate monitoring target settings based on Kubernetes label queries. Users can just refer to services and pods they want to monitor in Prometheus Operator’s manifest, and the Operator will take care of inserting appropriate Prometheus configuration for the Kubernetes auto-discovery.
To implement this functionality, Prometheus Operator introduces additional resources and abstractions designed as Custom Resource Definitions (CRD). These include:
- Prometheus resource that describes the desired state of the Prometheus deployment.
- Service monitors that describe and manage monitoring targets to be scraped by Prometheus. The Prometheus resource connects to ServiceMonitors using a serviceMonitorSelector field. This way Prometheus sees what targets (apps) have to be scraped.
- Alert manager resource to define, configure, and manage Prometheus alert manager.
In this article, we explore only the Prometheus resource and Service Monitors — the minimum needed to configure Prometheus Operator to monitor your Kubernetes cluster.
To complete examples used below, you’ll need the following prerequisites:
- A running Kubernetes cluster. See Supergiant documentation for more information about deploying a Kubernetes cluster with Supergiant. As an alternative, you can install a single-node Kubernetes cluster on a local system using Minikube.
- A kubectl command line tool installed and configured to communicate with the cluster. See how to install kubectl here.
With this environment set, we are going to monitor a simple web application exporting Prometheus-format metrics. Let’s get started!
Step 1: Create a Prometheus Operator
A Prometheus Operator has to access Kubernetes API, nodes, and cluster components, so we should grant it some permissions. We can do this via the
ClusterRole resource that defines an RBAC policy. The
ClusterRole contains rules that represent a set of permissions. These permissions are additive, so we should list them all. We will be using the
ClusterRole resource that can grant permissions to manipulate resources of the entire cluster as opposed to Role which is namespace-scoped.
- apiGroups: [""]
- apiGroups: [""]
verbs: ["list", "delete"]
- apiGroups: [""]
verbs: ["get", "create", "update"]
- apiGroups: [""]
verbs: ["list", "watch"]
- apiGroups: [""]
The above manifest grants the Prometheus Operator the following cluster-wide permissions:
- read access to pods, nodes, and namespaces.
- read/write access to services and their endpoints.
- full access to secrets, ConfigMaps , StatefuleSets, Prometheus-related resources (alert managers, service monitors, etc.) and other third-party resources, etc.
Next, we need to provide an identity for our Prometheus Operator. This can be done with a service account.
Now, as we have a
ClusterRole and a
ServiceAccount , we need to bind the list of permissions defined in the
ClusterRole to the Prometheus Operator. The
ClusterRoleBinding allows associating a list of users, groups, or service accounts to a specific role. We are going to bind our
ClusterRole to the Prometheus Operator’s Service Account.
- kind: ServiceAccount
roleRef.name should match the name of the
ClusterRole created in the first step and the
subjects.name should match the name of the Service Account created in the second step.
We are going to create these resources in bulk, so put the above manifests into one file (e.g.,
authorize.yml ) separating each manifest by
— — — delimeter. Then run:
kubectl create -f authorize.ymlclusterrolebinding.rbac.authorization.k8s.io "prometheus-operator" created
clusterrole.rbac.authorization.k8s.io "prometheus-operator" created
serviceaccount "prometheus-operator" created
Great! Now we have all permissions required by the Prometheus Operator to manage Prometheus instances and monitor applications. Let’s create a one-replica deployment for the Prometheus Operator:
- containerPort: 8080
There are a few important things that this manifest does:
- Defines several arguments for the prometheus-operator container to run with. In particular, we load the
configmap-reloadimage to be able to dynamically update Prometheus
- Defines the Prometheus Operator as the non-root user with the user ID 65534.
- Associates the deployment with the service account created in the step above.
Now, let’s save this spec in the
prometheus-deployment.yml and create the deployment:
kubectl create -f prometheus-deployment.yml
deployment.extensions “prometheus-operator” created
Verify that the deployment’s pods are running:
kubectl get podsNAME READY STATUS RESTARTS AGEprometheus-operator-77648fb66c-skjqp 1/1 Running 0 1m
Step 2: Deploy the App Shipping Prometheus-format Metrics
At this point, the Prometheus Operator has no apps to monitor. Thus, before defining ServiceMonitors and Prometheus CRD, we need to deploy some app shipping Prometheus-format metrics. For this purpose, we used an example application from the Go client library that exports fictional RPC latencies of some service. To deploy the application in the Kubernetes cluster, we containerized it with Docker and pushed to the Docker Hub repository. Let’s deploy this example app serving metrics at
/metrics endpoint which Prometheus watches by default. Below is the deployment manifest we used:
- name: rpc-app-cont
- name: web
Please, note the
8081 which is the port defined in the application code.
Save this manifest in the
rpc-app-deployment.yml and create the Deployment:
kubectl create -f rpc-app-deployment.yml
deployment.apps “rpc-app-deployment” created
Let’s verify that our deployment successfully launched two pod replicas of our app:
kubectl get pods -l app=rpc-appNAME READY STATUS RESTARTS AGErpc-app-deployment-698bd8658d-glj6f 1/1 Running 0 1mrpc-app-deployment-698bd8658d-xsdd4 1/1 Running 0 1m
To let the Prometheus Operator access this deployment, we need to expose a service. This service can then be discovered by the
ServiceMonitor using label selectors. We need to create a service that selects pods by their
applabel and its
rpc-app value. Let’s take a look at this service manifest:
- name: web
Also, take notice that we specified a
targetPortfor this service that refers to the port on backend pods of the service. If the
targetPort value is not specified, Kubernetes automatically assigns the value of
containerPort to the
targetPort, but we included the field explicitly to highlight its importance.
Let’s save this spec above in some file (e.g.,
rpc-app-service.yml ) and create the service:
kubectl create -f rpc-app-service.yml
service “rpc-app-service” created
You can now verify that the service successfully discovered the deployment’s endpoints and configured the right ports:
kubectl describe svc rpc-app-serviceName: rpc-app-service
Port: web 8081/TCP
Session Affinity: None
Step 3: Create a ServiceMonitor
Prometheus Operator uses ServiceMonitors to auto-detect target pods based on the label selectors and associate them with the Prometheus instances. Let’s take a look at the manifest below:
- port: web
The ServiceMonitor defined above will select pods labeled
spec.selector.matchLabels field. Please notice that this field should match
app:rpc-app so that the
ServiceMonitor finds the corresponding endpoints of the Deployment.
Also, we defined the
env:production label for the
ServiceMonitor. This label will be used by the Prometheus Operator to find the
ServiceMonitor. Finally, because we deployed our
rpc-app-container with the named port “web,” we can easily refer to it in the
ServiceMonitor without specifying the port number. This allows us to change the port number later without affecting the integrity of other resources.
Let’s create the ServiceMonitor:
kubectl create -f service-monitor.yml
servicemonitor.monitoring.coreos.com “rpc-app” created
Step 4: Create a Prometheus Resource
The next step is to create a Prometheus resource. Its manifest defines the
serviceMonitorSelector that associates
ServiceMonitors with the operator. The value of this field should match the label
env:production specified in the
ServiceMonitor manifest above. Using
ServiceMonitor labels makes it easy to dynamically reconfigure Prometheus.
Also, notice that you should refer to the service account created in the Step #1 above. Without this, the Prometheus Operator won’t be permitted to access the cluster resources and APIs. This tiny detail was addressed in the issue #1272 on GitHub.
Also, If RBAC authorization is enabled in your cluster, you must create RBAC rules for both Prometheus and Prometheus Operator. Refer to the chapter “Enable RBAC rules for Prometheus Pods” of the official CoreOS documentation to find the required RBAC resource definitions.
Now, let’s save this manifest in the
prometheus-resource.yml and create the Prometheus resource:
kubectl create -f prometheus-resource.yml
prometheus.monitoring.coreos.com “prometheus” created
Finally, we need to create a Prometheus Service of a
NodePort type to expose Prometheus to the external world. That way we can access the Prometheus web interface.
- name: web
Save this spec in the
prometheus-service.yml and create the Service:
kubectl create -f prometheus-service.yml
service “prometheus” created
You can now access the Prometheus dashboard from your browser. If running your cluster with Minikube, you can find the Prometheus IP and port with the following command:
minikube service prometheus — urlhttp://192.168.99.100:30900
You can then access the Prometheus dashboard in your browser entering this address.
If you go the
/targets endpoint, you’ll see the list of the current Prometheus targets. Each Deployment replica is treated as a separate target, so you’ll see two targets in your dashboard. You can also find the target’s labels and the time of the last scrape.
The Prometheus Operator automatically created a working Prometheus configuration with the
kubernetes_sd_configs for the auto-discovery of Kubernetes service endpoints. This is a really cool feature because it frees you from the necessity to learn Prometheus-specific configuration language. You can see the automatically generated Prometheus configuration under Status -> Configuration tab:
Finally, we can visualize RPC time series generated by our example app. To do this, go to the Graph tab where you can select the metrics to visualize.
In the example above, we visualized
rpc_durations_histogram_seconds metrics. As you see, we used a “stacked” option for time series visualization, but you can opt for simple lines, of course. You can play around with other RPC metrics and native Prometheus metrics as well. The web interface also supports Prometheus query language PromQL to select and aggregate metrics you need. PromQL has a rich functional semantics that allows you to work with time series, instance and range vectors, scalars, and strings. To learn more about PromQL check out the official documentation.
As you’ve now learned, the Prometheus Operator for Kubernetes offers useful abstractions for configuring and managing your Prometheus monitoring pipeline. Using the operator means you no longer need to manually configure Kubernetes auto-discovery settings, which involves learning a lot of stuff. All you need to define is the
ServiceMonitor with a list of pods from which to scrape metrics, and the Prometheus resource that automates configuration and links ServiceMonitors to running Prometheus instances. Along with these features, the Prometheus Operator supports fast configuration of Prometheus alert managers. All these features dramatically simplify the management of your Prometheus monitoring pipeline while retaining flexibility and control if needed.
Originally published at supergiant.io.