How Grafana became “the tip of the iceberg”

Laurent Bel
Pernod Ricard Tech
Published in
6 min readMar 22, 2021

When we discovered Grafana, we were not conscious of its potential and under estimated the importance it would take in our monitoring strategy. Here is our story.

Monitoring scope

We distinguish 4 types of monitoring with different sources and purpose:

1. Technical Monitoring

It goes down to classic metrics that I often describe as “CPU and RAM type of metrics”. It is clearly a reductive description, but people usually see immediately what I mean. Since we are using massively cloud services, this means getting metrics from Azure, Amazon Web Services or Google Cloud Platform.

If you are using Azure, AWS ou GCP, most of the metrics you are interested in are made available to you. You just need to leverage respectively Azure Monitor, AWS CloudWatch or Google Cloud Monitoring (formerly Stackdriver).

2. Application Performance Monitoring

This type of monitoring is tightly related to your application and the framework you are using (eg: Node.js, Python, Java, .NET, …). It consists in installing a little agent along your application to collect “insider metrics”.

This is helpful to diagnose internal problems such as slow database queries, or slow section of code, and much more.

We are using mainly NewRelic and Azure App Insights as APM solution on our side. But they are many other solutions such as Dynatrace, AppDynamics, AppOptics (SolarWinds), …

3. Logging information

It could be web server logs (Apache, IIS, …), error logs from PHP, or even custom application logs. All those logs are eventually centralized somewhere or in various places. In our case, we are mainly using two logging platform :

  • Loki: Like Prometheus, but for logs !
  • Loggly: a SaaS solution, acquired a few years ago by SolarWinds, that provides great logging centralization from many different sources.

4. End user Monitoring

It aims to monitor what end users are experiencing by looking at things from their angle. Instead of looking at CPU or RAM, that end users do not see, we are looking at the web page that they load, ideally from their location. You might also hear about RUM (Real User Monitoring).

If you ask me, I will tell you that this monitoring is the most important one, because it is what your end users will see and how they will feel in regards to your application.

We mainly use Site24x7 in our case, but there are many other solutions available on the market.

Mixing those sources

The data is there, available in various places. One central question remains:

How do you mix those data to provide nice looking dashboards ?

Until recently we were looking in various places to get the information we wanted. So we were going in AWS CloudWatch to look at a specific service and see some metrics before jumping into another tool to get another metric. We managed to get the job done in the end, but it was not very efficient.

This is where Grafana came into the game and helped us “centralizing” all the data in one place where it could be mixed, visualized and analyzed:

The beauty of all this is that it ended up quite simple to wire in Grafana and I’m still amazed today by the simplicity of onboarding a new Cloud subscriptions or new data sources.

From that moment, we were able to easily initiate nice looking dashboards for various projects and applications.

The other thing we did not anticipate was the adoption.

User adoption

It is always easier to promote things that people do like upfront. Grafana made adoption very easy, the effort on our side is limited, it sells itself. We identified the following stakeholders interested in Grafana:

  • Product owners & Business users
    Even though some metrics are a bit technical, you can still provide “business” oriented dashboards that Product owners and Business users will like. This gives them a control tower were they can view for example how many users or active connections are on their platform.
  • IT Developers
    During the build phase, developers might ask for detailed application logs, and performance metrics. If not available through self service, your developers will end up very frustrated.
  • Internal IT Operations
    IT Operations will be keen on having metrics to monitor the infrastructure and setup alerts.
  • External IT Operations
    You may outsource “run” activities to a third party. In many cases, such provider come with their own monitoring solution that you quickly become dependent upon (yes, it is sometimes part of a locking strategy). What we are now telling them is that, we won’t let them deploy anything, they will just use what we have in place. If they want to, they can setup new Grafana dashboards, but there is no reason to deploy another monitoring solution with whatever agents on the servers…
  • IT Management
    Last but not least, the management in our Organization, who is mostly interested in nice looking dashboards that illustrates that things are under control & monitoring. From time to time, usually right after an outage, I am asked about the monitoring in place: opening Grafana usually leaves the audience speechless.

Killer Features

Grafana comes with many features, here are my favorite ones:

  • Alerting

Alerting is a basic feature. What I actually love in Grafana is the way you setup up alerts and visualize them. Simply intuitive. It comes indeed with many notifiers (Email, MS Teams, PagerDuty, Slack…) to get the alerts distributed wherever you want.

  • Explore

The “Explore” section is really intuitive and give you access to all metrics in one place. Whether you are looking for CPU metrics or application logs, it is a great place to start your investigations.

Our stack !

Here is our current Monitoring stack. As you can see, Grafana became a central component.

Conclusion

Monitoring & Alerting is like an iceberg. Positioning Grafana at the top will allow you to somehow hide the technical wiring in the background and surface a nice dashboards and monitoring.

Disclaimer

  • I do not work for Grafana.
  • This article reflects my own independent view on the topic.

--

--

Laurent Bel
Pernod Ricard Tech

Leading the IT Architecture & Innovation team at Pernod Ricard. Interested in IT technology in general.