Grafana OnCall™ logo via https://grafana.com/, siren and Pi device licensed via Shutterstock

Managing Home Lab Escalations with Grafana OnCall

An Open Source Solution for Peace of Mind

Mike Riley
4 min readAug 1, 2023

--

https://pragprog.com/newsletter/

I have been using the free cloud-tier edition of Grafana for monitoring my home lab, predominantly comprised of various models of Raspberry Pi units and old PCs. I ran the open source on-prem Grafana instance for years before I opted to migrate to their free cloud-hosted tier so I could leverage the variety of integrations that is only available in the cloud edition. One of these integrations that has proved surprisingly useful is Grafana OnCall.

Grafana OnCall is an open-source incident response management tool that is definitely geared more toward large, multi-disciplinary enterprises than small home labs, but the tool itself is remarkably effective for a whole range of use cases. Better yet, it is automatically installed and pre-configured as part of the free Grafana cloud tier instance, so it is just begging to be used.

Insurance Against Temperature Extremes

Winters in Chicago can be brutal, and old furnaces have a tendency to fail whenever the temperature drops below freezing. Summers can be inversely sweltering with temperatures soaring into triple digit territory, burning out air conditioning units at a prodigious rate. Either of these two weather extremes can put home labs and homes overall in peril of bursting water pipes in the winter and overheating equipment in the summer. And Murphy’s Law dictates that these events will occur when the occupants are away, unaware that their home is in a precarious situation.

I have several Raspberry Pi Pico W units deployed in my home doing various tasks, one of which is monitoring room temperature via the Pico’s on-board temperature sensor. These sensors are connected to an on-prem Prometheus server that feeds into my cloud Grafana instance.

My Monitoring Setup

Setting up this pathway is easy following the easy step-by-step instructions outlined on Grafana’s website. Once the metrics are being sent to and ingested by Grafana Cloud, you can create dashboards to render the time series of metrics being collected. Finally, you can configure alert conditions for whenever certain defined thresholds are exceeded.

For example, my Prometheus server polls my Picos every 10 seconds, and pushes those metrics to Grafana. If the temperature exceeds 95 degrees Fahrenheit or falls below 55 degrees Fahrenheit, Grafana will send an email notifying me of this alert condition.

Why Use OnCall?

In today’s world of multiple notification inputs, email is typically the last form of messaging I check. Most of the time I check email only once or twice a day, which could be disastrous in the meltdown scenario I painted earlier.

That’s where OnCall comes to the rescue.

Since OnCall is already integrated into Grafana Cloud, setting up escalation chains and assigning them to escalation routes is easy. In addition to email, OnCall messaging endpoints can be Slack channels, Telegram DM’s, Microsoft Teams, and even computer-voiced phone calls and SMS. Best and simplest option of all, Grafana offers a free Grafana OnCall mobile client for Android and iOS platforms. The mobile app is my preferred OnCall messaging endpoint, and it works nearly instantaneously when an alert condition is triggered.

Alert Multiple People

Having multiple notification pathways is extremely helpful, but OnCall’s real power comes into play when notifying multiple parties of an incident and tracking the resolution status in real-time. In the case of my home lab monitoring, I have my family members on the OnCall notification policies. That way I am covered if a situation arises that requires urgent attention when I happen to be unavailable. If I have not acknowledged the alert 30 minutes after it occurred, OnCall escalates the notification to my family members. If they fail to acknowledge the OnCall notification on their phone after 20 minutes, OnCall will cascade the message to our family Telegram channel. That said, the alert is typically addressed before that final notification endpoint is reached.

OnCall’s escalation and built-in incident response management features have saved my home lab (and my home in general) from potentially dire outcomes on several occasions. If you have a home lab or are a home automation enthusiast, I found that Grafana OnCall can provide an additional layer of protection and peace of mind that few other free, open-source solutions have to offer.

--

--