We all know that it is important to monitor your servers and services, so you can spot issues before they become problems. I personally have spent a lot of time configuring nagios to email me about issues and I have recently been configuring various different alerts in Azure.
My old boss has this idea that I should have a big monitor screen displaying all the vital stats of my servers and services, I personally disagree with this idea and think that notifications on my phone and email alerts are sufficient. He will no doubt correct my thinking when he reads this, but I believe part of his thinking is to make the monitoring of your infrastructure move visible and make it obvious to anyone that walks past that you have your eye on everything.
For the purpose of this blog post lets assume he has convinced me and I have convinced my actual boss to spend money on the required technology to do this (No easy feat). What exactly would I display on this screen?
I have Google Chromecast that I use for streaming various things to my TV, this is a relatively cheap bit of technology that could allow a TV or monitor to display a web page with the required stats displayed.
The two main sources of information that I want to display are New Relic for monitoring my azure websites and Nagios for monitoring my internal servers. New Relic allows you to easily export live performance data as iframes so I quickly threw together a web page full of these graphs. However if you have a static screen on the wall you don’t want to have to scroll to see different information so I needed to come up with another way to display this information.
Now what information wants to be included in a script like this? Showing too much performance data can almost be as bad as not doing it at all as problems gets drowned out in the noise. For me I have performance of my websites, followed by Nagios problems, followed by the azure status page, followed by memory usage of all my servers and lastly showing number of connections to my databases. Another question to consider is what time scales do you want to graph over, too long and you don’t see what is happening now, but too short and you may only worry about an intermittent issue?