Optimizing for developer happiness — by Chad Dickenson was one of the talks that had a lot of influence on me when starting with Continuous Delivery. It talks about how “easier to ship” brings both happiness to the team and in-turn steady flow of value to the customers.
Continuous delivery gives the opportunity for faster feedback about:
- Usage of the new features deployed
- Issues cropped up due to the deployments
To gather the feedback, we need enough monitoring and alerting mechanisms. Monitoring and alerting usually relates to system monitoring. Monitoring the system behaviour [I/O, Memory, Disk space, etc.] is essential. But equally important is to watch the application metrics.
Identify the key features that need tracking. Keep watching the same after every release to make sure that there are no surprises. This post talks about how Etsy uses Graphite for monitoring and alerting key business metrics.
We spend a lot of time gathering metrics for our network, servers, and many things going on within the code that drives…codeascraft.com
For Good Karma, we’ve been using Metabase, which also provides the option of querying the data using SQL. It gives alerting mechanisms to warn when specific criteria are met. Metabase offers REST APIs too.
It is easy to set up an alert system, but it is another thing about using the information for product improvements. The crucial part of Operability is how these systems help the team to arrive at insights quickly.
John Allspaw had written an insightful article about the same. The post is about the importance of “design” for alert systems. Here are a few questions that will help us to understand the “usefulness” of the existing alert system:
* Who has ever gotten an alert and ignored it? (/me looks at alert, says “oh, it’ll probably recover, no need to look further”)
* How many alerts were received in the past week that were not actionable? (no human action was required)
* How many alerts were received in the past week as a result of known work being done (expected) but alerts were not silenced during that period?
* How many alerts were received as a result of a previously silenced alert (because work was being done) that was mistakenly un-silenced?
In the past month or two, I've spoken on the topic of alert design. There's a video of my giving the talk (at…www.kitchensoap.com
Below is an open letter by John Allspaw requesting the companies to stop giving a false impression that any tool can solve all the issues magically.