Measure twice, cut once
Make sure you measure enough to avoid mistakes
I remember when I was a youngster learning practical skills such as wood-working from my Dad, and “helping” with various DIY wood-working tasks around the home he always repeated the mantra “measure twice, cut once” both to educate me in the importance of double-checking things, but also I think as a reminder to himself. It is one of those phrases that has stuck with me, and I now find myself telling my children.
More recently I have been working to introduce and improve the ways of working, with a particular focus on DevOps principles. An important task that underpins the 3 DevOps principles is that of measurement.
Systems thinking (and value-stream mapping) uses metrics to identify the high-value waste and delays to target. How can you improve the feedback loops if you don’t know the current state? How can you evaluate if experiments and innovations are worth keeping if you don’t know how things were before and after?
In my experience many organisations don’t measure enough, or don’t measure the right things. Recently I received feedback that teams were impeded due to “some issues with environments”. There was an underlying expectation that I would leap to my feet and “fix” the problems (whatever they were). In further discussions there were some issues but the cost of delay, frequency or the impact could not be measured so it was difficult to prioritise above other tasks in the backlog.
Goodhart’s Law is formally
Any observed statistical regularity will tend to collapse once pressure is placed upon it for control purposes
It often phrased as
When a measure becomes a target, it ceases to be a good measure.
It can be overlooked when introducing metrics when people focus on providing numbers and pretty graphs for management reports. That’s not to say measurement is a bad thing, but relying on the metrics alone and not looking at the underlying reasons, or not considering the effect on culture and behaviour when introducing metrics can be counter-productive.
There are plenty of metrics that can be captured that are not technical or money-related: for example, staff engagement, motivation, and overall productivity (lead times, cycle times and so on). Some of the simpler ones I have seen used as a magic number to provide some comfort or for people to look like they know what they are doing are:
Teams measure velocity (stories, time, story points doesn’t really matter) as a tool to identify areas for discussion and improvements, and as a way of tracking the past performance as a way to forecast the future.
I have seen velocity used to compare teams, causing teams to game the story points to be seen to be better (have a higher velocity number) than other teams. Velocity has been used to compare the story points committed in a sprint to the story points completed at the end of the sprint. There is value in this for the team to discuss the differences*. However it has been used as a reporting metric for senior management leading to teams moving stories to “done” to get the points, rather than being open and transparent when stories cannot be completed fully and therefore not able to “claim the points”.
Availability makes sense as a metric to measure. If changes are made to a system that affect the availability (again positively or negatively) there are lessons that be learned to address a negative impact, or to make sure positive impacts are maintained.
Only measuring availability though can lead to a risk-adverse culture where changes are not made in case availability is negatively affected. This may be a valid approach if you are making something that is safety-critical for example.
Deadlines & budgets
Deadlines & budgets are always present. Deadlines (and progress towards it) should not be the only metric that is measured as that leads to drops in quality, increases in technical debt, and (unintended) consequences on engagement, motivation.
There is value in tracking them though. If deadlines and budgets are at risk of being missed then discussions with stakeholders, teams and customers need to happen to discuss the best approaches — is the work stopped, scope changed, deadlines and budgets reviewed, or intended risks for reducing quality or increasing technical debt openly discussed and agreed.
Many DevOps articles and webinars hail the use of automation as a way of increasing quality and throughput. Often the follow-up statement is that test coverage needs to be measured. We have discussed whether the CD pipeline should fail builds that don’t meet a test coverage metric as a way to “encourage” teams to write tests.
That is not encouragement but bullying — the hardest part of a DevOps transformation is the cultural work to show the benefits of writing things like unit tests and get the engagement and buy-in from teams to regard them as a fundamental part of completing a story, and not something that can be skipped. Having a fixed test coverage metric does not allow for a risk-based approach to development.
I have experience a team who gamed the test coverage by writing a class that was not used with high test coverage to meet the arbitrary coverage target, allowing their code to be deployed.
The danger of over-measurement
As another example of Goodhart’s Law it can be easy to get blinded by too many metrics and looking purely at the numbers and not discussing the reasons behind changes in those metrics. I have seen teams jump to metrics, like velocity or test coverage, report those as fact to senior management who mis-interpret those numbers which then become the focus, rather than the delivered outcomes.
For me the recent Accelerate: State of DevOps 2018 report contains a sensible set of metrics to start with for productivity and availability. Engagement metrics are, in my view, better tailored to the culture.
Don’t be afraid to change them, challenge them, add new ones, drop ones that are not useful anymore.
This brings me back to the childhood memories of helping my Dad with DIY woodworking: when you are about to make changes to your software development processes, it is still important to make sure you are measuring the right things before you get started.
Just don’t get stuck on the numbers.
*Differences between committed and completed stories are worth discussing whatever the difference. The obvious case is when the team complete fewer stories from the number that were committed to as the discussions in the retro can look at the reasons for that. It is also worth looking at when more stories are completed than committed as there may be lessons learned in how that happened that can be applied to improve the team.