This was extracted from my piece on counting production incidents to reduce its footprint.
Unfortunately for die-hard metrics folks, the reality is that the higher-ups really do want a single figure that fits nicely into one cell of a spreadsheet. Here’s a rough methodology that can be a starting point, but do take liberty in tuning for specific needs:
- As a rule of thumb I just invented, each system should have around 5–10 quality metrics. Find the target acceptable quality threshold for each metric. Use data to drive this discovery. For example, if there is strong evidence that unrecoverable request errors are linked to an abnormally high rate of aborted user sessions, then there is also likely a Pareto threshold for diminishing returns of improvement. This threshold should be the 80% score for a given quality metric.
- Weight each quality metric between 0.0 and 1.0, with the constraint that all of the weights must add up to 1.0. This constraint forces discipline. Use data to drive these weights as much as feasible. The frequency of nastygrams-to-churn ratio seen by the support team for different customer segments could inform this weighting. I realize that it is a tedious exercise to group support requests in a way that correlates to quality metrics, but the alternative is shooting in the dark.
- Peer review. Get the other people who are doing the same thing in a room and do a calibration session across several services. Be critical but constructive. Be rigorous. Ask for the data.
- Score in aggregate using a weighted average across each quality metric’s score. To avoid re-inventing the wheel, the grading system commonly used by the US education system is good enough. A for a ≥90%, B for ≥80%, C for ≥70%, D for ≥60%, and F for everything below. So hitting the target threshold exactly for every quality metric yields a B. Almost everyone will get what a B means immediately: good, but could be better.
This is by no means a slam dunk that will irreversibly link a team’s self-perception to the actual customer experience. It takes constant vigilance. Learnings will happen. The model will need continuous refinement. But at least the metrics can tie directly to the customer’s experience, and that’s really what matters in the end.