Bozobooks.com: Fullstack k8s application blog series
Observability Metrics: Prometheus, Grafana, Alert Manager & Slack Notifications
Chapter 9: Set up the Metrics observability and configure the alert manager to send alerts to our slack channel when something goes wrong. Also, configure Grafana as our single pane of glass.
--
Observability is one of the very key aspects of distributed application architectures, as it's important for us to get real-time alerts on any failures that may happen in the clusters. It consists of visibility and monitoring of the following
- Metrics ( We will be covering Metrics with Prometheus in this blog)
- Logs (We will be covering Log aggregation with Loki in Chapter 10)
- Traces (We will be covering traces with Zipkin in Chapter 11)
In chapter 12, we will also explore OpenTelemetry, which is turning out to be the new standard for Observability.
In this blog, we will set up the Prometheus stack to collect metrics from the nodes, PostgreSQL, Redis, and our quarkus application. We will be configuring Alert manager to send alerts to Slack and will build Grafana dashboards to track these metrics.
Let's get started with deploying the Prometheus stack in our cluster.
Install Prometheus Stack
We will use the Prometheus community helm charts to deploy the stack. Let's first add the helm repo and update it with the following commands
helm repo add prometheus-community https://prometheus-community.github.io/helm-chartshelm repo update
Before we deploy the Prometheus stack, let's first update our values file. (Please refer to this link for the complete values.yaml)
Alert Manager Configuration
We will be using the alert manager component of Prometheus stack to trigger alerts, and send them to our slack channel for notifications. Let’s first set up a slack channel and create a webhook for alerts. We will be using the same slack app that we created in Chapter 4. We will add another channel called “monitoring”, where we will be posting the alerts.
- Goto the slack application that was created in Chapter 4
- Configure new Incoming webhooks, by selecting the “Incoming Webhooks” Option
- Click on “Add New Webhook to Workspace” and select “monitoring” channel. If the channel does not exist, please create one. The following is the screenshot, where you can copy the webhook.
You should get a message on the slack channel, that the webhook is integrated. The following screenshot shows the typical message
Now let's configure the alert manager part of the values.yaml
. The following shows the screenshot of configuring the slack channel for alerts.
It is self-explanatory, we are configuring the slack_api_url
with the webhook, creating a receiver called, slack-notifications
and routing the alerts to the same. We are providing the channel name and the template to use to publish the message.
I had to put a
name: 'null’
receiver to overcome a bug (or is it a feature :-D ). if we don’t put that, the alert manager configurations were not getting updated when we install the helm chart. Hope this will be fixed soon.
Lets now configure grafana
Grafana Configuration
As part of the Promethues stack helm charts, we also get Grafana, we have to configure the volume for persistence, otherwise, every time, we restart Grafana, we will lose all our dashboard configurations. The following is the screenshot of the configuration
- Line 20 — We are enabling Grafana so that the helm chart installs Grafana
- Line 22–29 — We are setting up the persistence by creating a persistent volume claim
- Line 31-39 — We are configuring the enabling default dashboards and data sources, which will configure “Prometheus” as the default data source
Lets now configure Promethues
Prometheus Configuration
Later in the blog, we will be configuring our microservices (quarkus applications) to also provide metrics. These microservices will expose the metrics on /q/metrics
URL.
Let’s configure these endpoints on promethues. The following screenshot shows two targets configured, to scrape the application metrics
Let's also define basic rules. The following screenshot shows, standard Prometheus rules to alert when an Instance is down for more than a minute
This will trigger an alert when the instance is down for more than a minute, and the alert manager will then pick that alert, and notify over slack.
Install Prometheus stack
Now that we have the complete values.yaml
(you can find it on my GitHub here). Let’s install the helm chart with the following command
helm upgrade -i --create-namespace -n monitoring prometheus prometheus-community/kube-prometheus-stack --values ./values/kube-prometheus-stack-helm-values.yaml
This installs the complete stack, including the node exporter.
Bug: I found another bug while using docker desktop (which is not in Rancher desktop). The node exporter does not start, to fix that we have to execute the following command. thanks to jamesbright, I foudn this work around at https://github.com/prometheus-community/helm-charts/issues/467#issuecomment-957091174
kubectl patch ds -n monitoring prometheus-prometheus-node-exporter — type “json” -p ‘[{“op”: “remove”, “path” : “/spec/template/spec/containers/0/volumeMounts/2/mountPropagation”}]’
Let's now install Redis exporter and PostgresSQL exporter
Install Redis exporter
To collect metrics from Redis, we will have to deploy a Redis exporter. The following screenshot is the values.yaml
for the prometheus-community/prometheus-redis-exporter
- Line #2, we are providing the URL to our Redis instance.
- Line #5 we are defining a label
release: prometheus
, This is an important label as Prometheus instance picks all the exporter that has this label, to scrape and collect metrics. - In lines 7–14 we are defining the service monitor, and once again the label
release: prometheus
, so that Prometheus can collect metrics
Let's now install the Redis Exporter with the following command
helm upgrade -i --create-namespace -n monitoring redis-exporter prometheus-community/prometheus-redis-exporter --values ./values/redis-exporter-helm-values.yaml
Install Postgres exporter
To collect metrics from PostgresSQL, we will have to deploy a Postgres exporter. The following screenshot is the values.yaml
for the prometheus-community/prometheus-postgres-exporter
.
We can install the exporter with the following helm command.
Best Practice: The database password can be stored in a secret, or injected as a secret from vault.
values.yaml
allows us to use a kubernetes secret key, instead of providing the password directly.
helm upgrade -i --create-namespace -n monitoring postgres-exporter prometheus-community/prometheus-postgres-exporter --values ./values/postgres-exporter-helm-values.yaml
Configure ServiceMonitor for Nginx
We will also configure our Nginx server to provide metrics to Promethues, This will provide the HTTP metrics. We will update the ingress-nginx ingress-nginx/ingress-nginx
helm chart to also expose a ServiceMontior by making the following updates to the values.yaml
.
The configuration is self-explanatory. The key thing to remember is to provide the label release: prometheus
.
Let's upgrade the ingress deployment to take effect.
helm upgrade -i --create-namespace -n ingress-nginx ingress-nginx ingress-nginx/ingress-nginx --values ./values/ingress-nginx-values.yaml
We should see the Nginx reflect in Promethues targets, as shown below
Now that we have all the service monitors configured, lets modify our microservices to expose the metrics.
Configure Quarkus application for monitoring
To let our Quarkus application expose the metrics on /q/metrics, we will use a quarkus-micrometer. To get it working, we will have to update the pom.xml with the following dependency
We then have to update our application.properties with the following code, to enable micrometer export
quarkus.micrometer.export.json.enabled=true
We can now import and use micrometer classes to share the metrics. The following code shows how to import and inject the micrometer library. Please refer to the Micrometer documentation, for more details
//import micrometer classes
import io.micrometer.core.instrument.MeterRegistry;
import io.micrometer.core.instrument.Tags;
//Service class code
@Inject
MeterRegistry registry;//Inside the function we can define customer metrics with the //following code
registry.counter("searchByKeyword() Counter", Tags.of("keyword", query)).increment();
The following screenshot shows the updated BookInfoService code
Once we build and deploy this code, we should be able to see the metrics at /q/metrics
path. The following is the screenshot of bookinfo
, after making the changes.
Now we can configure dashboards on Grafana, by accessing the Grafana service (port-forward). The following are some screenshots of the dashboards.
The following is the screenshot of alerts that are notified on slack.
That's all for now, in the next chapters, we will be looking at configuring Loki for logs and Zipkin for Traces. This will provide complete observability of our cluster.
I hope this was useful, please leave your feedback and comments
Take care…have fun ;-)