Bozobooks.com: Fullstack k8s application blog series
Observability: Log Aggregation with Loki & Grafana Alerts Integration with Slack
Chapter 10: Configuring Grafana Loki to aggregate all the logs across the Kubernetes cluster and set Grafana alerts to send notifications on Slack
Hey everyone, it’s been more than six months since I got some peaceful time for myself to get back to my series. I have been getting a lot of great feedback and comments from people who have been reading my blogs. Thanks for all your feedback, that is what motivated me to finish this series. I’ve been super busy exploring (or lost??:-D) the new AI world — GenAI, Neural Networks, Transformers… reskilling myself… :-D
In this blog, we will continue our journey in implementing the second key pillar of observability, “Logging”. We will be implementing the logging with Loki, and integrating that with our Slack channel, which we had created in Chapter 9, Observability Metrics: Prometheus, Grafana, Alert Manager & Slack Notifications.
Log management is a crucial aspect of any application or system monitoring strategy. Grafana Loki is an open-source log aggregation system that offers an efficient and cost-effective solution. With its distributed architecture and innovative log stream compression technique, Loki tackles the challenges of high-volume log data storage and retrieval. Seamlessly integrated with Grafana, Loki empowers users to visualize and explore their log data with ease. Whether you’re troubleshooting issues, tracking performance, or gaining insights, Grafana Loki proves to be a valuable tool for efficient log aggregation and analysis.
Here are the steps on how to implement logging with Loki and integrate it with our Slack channel:
- Install Loki.
- Configure Loki.
- Create a Slack channel.
- Integrate Loki with Slack.
Once you have completed these steps, you will have successfully implemented log aggregation with Loki and integrated it with your Slack channel. You can now start collecting and visualizing your log data, and troubleshooting issues more effectively.
Let's start with installing and setting up the stack
Step 1: Install Grafana Loki
Let's add the Grafana repo (if not added already, we have already done this in Chapter 9), and run a repo update
helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
Let's set up the values for a simple deployment on the local machine. (for production deployment, we will be using a different configuration, which will be a scalable deployment. Refer to Grafana documentation for more details on various types of deployments).
The following code shows the values.yaml
configuration for simple deployment, followed by installing the helm chart.
global:
namespace: monitoring
loki:
enabled: true
persistence:
enabled: true
accessModes:
- ReadWriteOnce
size: 10Gi
annotations: {}
promtail:
enabled: true
config:
lokiAddress: http://loki-loki-distributed-gateway/loki/api/v1/push
helm install loki grafana/loki-stack -n loki - create-namespace - values=./values/loki-values.yaml
Here is the screenshot of the output
We can now check if all the pods and services are up and running
kubectl get pods -n loki
kubeclt get svc -n loki
Before we proceed further, make sure that the Loki pods are all running, without any errors. Let's now configure Grafana to show the Loki logs.
Step 2: Configure Loki in Grafana.
We should be able to add Loki as a data source in Grafana. The following screenshot shows the typical configuration. Since I have installed Loki in Loki namespace the URL to access the service is http://loki.loki.svc.cluster.local:3100. We can save and test.
We can now create a panel in the dashboard. The following screenshot shows the typical query to check if the Loki is working fine.
The following screenshot shows a query to look at the trending rate, with the query
rate({namespace="bozo-book-library-dev"} |~ `(?i)error` [1m])
We will be using this query to also generate alerts. The following screenshot shows the number of errors (I have generated some error conditions, by adding the same book, again and again into the library, which generates the error log).
Step 3: Configure Grafana Alerts
Before we get notifications on our Slack channel, we need to set up alerts and configure the rule for any alerts to be fired. To configure the alerts, click on the alert tab, and you will see 3 sections by default
- Section A: Configure the rule in the form of LogQL. In our case will be providing a query to check the rate of errors at the last 1 minute.
rate({namespace="bozo-book-library-dev"} |~ `(?i)error` [1m])
- Section B: Helps configure the condition. In our case, we will be using classic conditions to look evaluate if the average error rate goes beyond 2 times
- Section C: helps in configuring the threshold, which we will not be set right now.
The below screenshot shows the configuration, I used
We can preview to see if the alert fires (for this to fire, once again, I generated error logs, by adding books to the library, that are already there). You can see the below screenshot, that the alert is fired (as we have more than 2 errors at last 1 minute)
Now we have the alert rules configured, and we have tested if the alert gets fired when the condition is satisfied. Now in the next step, we will configure the Slack contact point.
Step 4: Setup Slack contact point
Grafana allows us to configure various types of contact points, to publish alert notifications (including emails, various chat systems, alert-manager, etc). To configure the contact point go to Alerting -> Contact points and select the “+ Add Contact Point” button.
In our case, let's select Slack, and provide the webhook, that we had configured in Chapter 9. The following screenshot shows the configuration
Let's test if the contact point works, by clicking the “Test” button. We should see the alert notification on our Slack channel. The below screenshot shows the test message on my Slack channel.
Now that we have the alerts configured and the slack contact point configured, we need to create a notification policy to notify on our slack contact point, when an alert is fired. To do that, we need to go back to the alert configuration and provide a “Notification” name-value pair, that will be used in the notification policy to match. Here is the screenshot of the name-value pair configuration, I used
We will be using this name-value label to identify the alerts, that go to our Slack contact point in the next step.
Step 5: Set Notification Policy
To configure the Notification policy, go to the “Alerting->Notification Policies” menu option. Select the “New Nested Policy” button, and provide the matching level and the contact point. In our case, we will provide the name-value label we created in the Alert configuration and select our Slack contact point. The following screenshot shows my configuration.
You should now start seeing the alerts notified in the Slack channel. The following screenshot shows my Slack notifications.
As you can see the alert does not have a lot of details, properly displayed such as value, the links to the Grafana dashboard, etc. In the next step, we will be fixing that.
Step 6: Passing the exact alert values, and URLs to the Slack notifications
To set up specific values, we can use the annotations, and use {{$}}
to be included in the description. We can add these annotations in the Alert Rule configuration. Here is a screenshot of what I have configured to capture more details about the error
refer to the Grafana documentation for more details on what annotations can we use and how can we parameterize our custom values in custom annotations
That's the quick one, on how to get Loki working, visualize on the Grafana dashboard, and set up alerts and notifications to Slack.
In the next chapter, we will move to distributed tracing, which is another critical observability
I hope this was useful, please leave your feedback and comments
Take care…have fun ;-)