Living on the edge — How Screenly monitors edge IoT devices with Prometheus
This is a guest post from Viktor Petersson (@vpetersson) who discusses how Screenly uses Prometheus to monitor the thousands of Raspberry Pis powering their digital signage network.
At Screenly, we are long-time Kubernetes fans, and we use Prometheus to monitor our infrastructure. Many, if not most, Kubernetes teams also use Prometheus to monitor and troubleshoot our infrastructure. Over the years, we have found Prometheus to be extremely versatile, and we have expanded our use of Prometheus to include business intelligence metrics. The one problem we have experienced is that it is painful to use Prometheus for the long-term storage of metrics. Weaveworks however solves that pain point with its hosted Prometheus as a Service (Cortex) within Weave Cloud. At Screenly we use Weave Cloud to store business metrics and use them as part of our troubleshooting toolkit.
How we monitor IoT edge devices
To provide a bit of context, Screenly is the most popular digital signage platform for Raspberry Pi. With Screenly, you can use a Raspberry Pi to display 1080p Full HD images, videos, and websites on a TV or monitor.
If you have a fleet of monitors in your offices across the globe, you can use Screenly for remote display management and to control your screens from anywhere. Screenly also allows teams to schedule content, such as metric dashboards or internal communication announcements.
Monitoring thousands of Raspberry Pis
With thousands of Raspberry Pis running in the wild, performance monitoring becomes an issue. For instance, how do you know that the latest software update you just pushed out does not eat up too much memory or CPU? Enter Prometheus.
Prometheus and visualizing resource consumption
Prometheus added ARM support back in 2016 (kudos to Julius Volz for helping us out). Around 2017, we adopted Prometheus in our QC rig (which we are in the process of overhauling with a proper QC system running in a data center).
With Prometheus, we can visualize the resource consumption of our QC fleet. This ability allows us to release a new version of our digital signage player software to the staging channel and monitor performance implications before promoting the release to production.
The way this monitoring process works is fairly simple. We add the Prometheus ARM binary into our Ubuntu Core Snap. We then periodically run Prometheus in conjunction with the Prometheus Node Exporter. That way, we get both custom metrics and resource metrics. Finally, we use the remote_write functionality to push those metrics to Weave Cloud.
Since our first deployment in 2017, we have expanded our use of Prometheus even further. Today, we use Prometheus to spot check performance across our production deployment, too. By being able to toggle the Prometheus reporting remotely, we can monitor performance across our deployments. Moreover, we now actively use Prometheus as part of our troubleshooting toolkit. For instance, if a customer reports performance issues, we can now gain detailed insights into the performance of the particular digital signage player (in addition to system logs, etc.).
With PromQL, we can perform powerful queries to find patterns across our fleet. We can then integrate those metrics and put them in front of our team via Grafana dashboards.
P.S. If you are looking for an easy way to manage dashboards (such as Grafana), Screenly makes displaying dashboards super easy. We can even take care of the authentication for you. You no longer need to attach a keyboard and mouse (or using VNC) to your wall screens.
Final thoughts
In this post we discussed how Screenly takes advantage of Prometheus to monitor and manage resource consumption across their Raspberry Pi digital signage platform. For more information on how Prometheus can help your team manage Kubernetes sign up for our live Q&A session on the Weave Kubernetes Platform (WKP).
Originally published at https://www.weave.works on October 6, 2020.