Maker’s Metrics, Manager’s Metrics
Paul Graham’s 2009 essay Maker’s Schedule, Manager’s Schedule was a big influence on me as I was shifting from full-time software engineer to managing software projects (while also trying to find time to write code daily).
Just like schedules change drastically based on role, metrics undergo a similar shift.
A key principle of Agile is “self-organizing teams”, where the makers have a high degree of autonomy. Managers play an outward-facing role, acting as a conduit between teams and the organization while focusing on coaching, mentorship, process improvement, and removal of impediments.
In a previous post, I outlined important Agile metrics. These are geared towards people who look holistically at teams and the organization they work in. In other words, they are “manager’s metrics”.
A different sort of “metrics” emerge organically from makers in Agile teams. Often, these are not explicitly viewed as “metrics” at all, but they are nonetheless crucial components to the delivery of working software. If you are a maker, be sure you understand what your key metrics are and discuss them periodically with your team. If you are a manager (PM, ScrumMaster, coach, etc.), make sure you understand what metrics bring value to your teams.
Story Point Accuracy
Story point accuracy shows how well work is estimated. Were story estimates a reflection of reality or does the team have a tendency to over/under estimate?
When things are going well and teams are delivering working functionality each sprint, it’s easy to assume that everything was estimated correctly. That isn’t always the case, as sometimes people will not want to admit that their estimates were inaccurate. Heroic efforts could be going on silently behind the scenes just to meet commitments and maintain velocity.
If underestimation becomes the norm, it can lead to unsustainable development practices. Eventually, the amount of work required to get over the estimation gap can snowball and cause breakdowns and burnout. While this can’t be easily quantified with a number, it can be gauged by talking about it in a retrospective.
On the flip side, overestimation can skew numbers and cause issues with future planning. This is typically seen by large, unexpected swings in velocity. For example, if a team who reliably completes 40 points per sprint all of a sudden finishes 100 points, it’s a sign that stories may have been overestimated. Smaller swings in velocity are very standard and a reason why using a rolling average velocity corrects for these over time.
If over or underestimation becomes an issue, revisit your estimation process. Are the estimators the same people each time, or do the people giving the estimates differ? Is the team estimating too much in a single session, getting fatigued and not giving each item enough due diligence? Are stories too broad or too vague? These are all excellent retrospective topics.
Peer Review Cycle Time
Do your teams have a collaborative code review process (peer review) as a component of your Definition of Done for stories? Having two or more sets of eyes on code tends to elicit edge cases and stylistic issues. Additionally, talking through what was built helps all parties understand it better. This reduces your team’s bus factor and leads to learning and growth.
Peer review cycle time is the amount of time from the beginning to end of the code review/peer review cycle for a story. Even if you do not track this precise number, team members will typically have a sense of how long this process takes.
As a general rule of thumb, if peer review cycle time is measured in days rather than hours, the process is probably taking too long. Items that sit for many days create bottlenecks that delay other tasks. Meanwhile, knowledge of the work is not as fresh in anyone’s mind.
If this happens near the end of a sprint or project, it puts the ability to ship working functionality in jeopardy, as there may not be enough time to complete peer reviews. Alternatively, it may result in low quality, cursory peer reviews or skipping them altogether, leading to defects and knowledge gaps.
Build Status
Tracking the build status is a best practice of Continuous Integration (CI) and encourages teams to have software that is always in a potentially shippable state. CI tools make it possible to automate and visualize this in real time. In the absence of tools, this can be done manually.
Maintaining a nightly, baseline build is essential to ensuring that the team is delivering working software not just every sprint, but every day.
Projects that have working builds and can be integrated back into the production codebase will tend to have more favorable outcomes, as there are fewer chances of integration issues and mad scrambles to get things working late in the project.
CI is a worthwhile investment, but like any capability, it does not have to be an all or nothing proposition. Teams should keep a pulse on how their CI capabilities are improving over time.
Test Coverage
A pragmatic approach to testing is crucial. Celebrating high percentages of test coverage is just about as useful as celebrating how many total lines of code there are (which is to say, not useful at all). What is being tested is more important than how much test coverage there is.
Every team starts out with 0% of code covered by tests, so moving to 10%, 20%, 50%, 75%, and so on is better than where they started. Improving test coverage can happen progressively and be an evolving capability as teams learn how to balance testing and business priorities.
Test coverage is almost exclusively a maker’s metric. Teams should set their goals for test coverage and reflect on how helpful and meaningful their tests are. They should not be influenced by managers who are pushing for higher and higher percentages of test coverage, as this can lead to a pointless scenario where low-quality tests cover mandated high percentages of code.
Defects
Defect Criticality Index (DCI) and Defect Removal Efficiency (DRE) help measure overall software quality. These are ideal metrics for both makers and managers to understand the impact of investments in CI and testing.
In this model, bugs are assigned a score based on severity, from 1 (Low) to 4 (Critical). Note that bugs created during the development process are not assigned point values, so they do not count towards velocity.
Let’s assume that ten bugs are found during a project. Five are “Low”, two are “Medium”, two are “High”, and one is “Critical”. The total development DCI would be 19, which is the sum of the criticality score for all bugs.
Let’s also assume that after the project launches, five bugs are found. Four are “Low”, zero are “Medium”, one is “High”, and zero are “Critical” for a total production DCI of 7.
DRE can then be computed as follows:
(Production DCI) / (Development DCI + Production DCI) = DRE
(7) / (19 + 7) = .27
Not much can is gleaned from this number by itself. “Good” DCI/DRE numbers are only established over time and vary widely based on business, environment, and other constraints.
Makers can track their team over time and see trends in velocity, test coverage, and development DCI. Managers can look across teams and see patterns and correlations in velocity and DRE.
For example, a team that is working at a consistent velocity, only slightly improving their test coverage, and significantly improving their DRE is probably investing their time into the most valuable tests.
On the other hand, a team that is significantly improving their test coverage, but sees their DRE trending in the wrong direction may indicate that poor quality tests have been written. A drop in DRE may also reduce velocity, as the team spends more time fixing bugs than moving the project forward.
Both makers and managers should keep a close eye on correlations and trends in velocity, test coverage, and defects.
Summary
Metrics differ for makers and managers in an Agile environment. Both groups should understand each other’s key metrics and know which metrics they share.
Employ a practical approach to metrics. Don’t try to institute too much too soon and allow time for metrics to emerge organically based on team and organization needs. Similarly, don’t view capability-based metrics, like CI, test coverage, and DCI as “all of nothing” items. Instead, use them to measure continuous improvement and team growth.