Metrics and KPIs
From the “Software Engineering Cookbook” Series — How to improve your team’s performance through objective observation
I am back with another article from the “Software Engineering Cookbook” Series, targeted at growing engineering teams. This time around, we will tackle the oh-so-exciting topic (right?) of performance measurement. Let’s go!
Unikorn Inc. — A case study (inspired by real events)
What are we doing? Are we moving in the right direction? It looks like there is always more stuff in the backlog than we can work on, and it keeps on growing!
We used to finish working at 8 pm, then that became 8.30 pm, then 9 pm. Before you know it, we had Friday night sessions going until 11 pm on a good sprint and 3 am on a bad one.
Was our code really that bad? Did we bite off more than we could chew? Wait, why was John, who had two young kids waiting for him at home, not going back to have dinner with his family anymore?
Such was life at Unikorn, a startup with many ambitions, producing software at a breakneck pace as the world of computing was becoming increasingly more mobile.
The team was very talented, headed by a CTO who was a very smart and skilled software engineer, yet they lacked a method to focus this raw energy into efficient software production, resulting in a reduced ability to correctly estimate work and therefore ending up with crunch time way too often.
Part of the problem was a lack of a simple method to measure progress objectively, and identify areas of weakness to work on.
Checking the performance of the team is, on the surface, very straightforward. You can just see how good and how stable your software is at the end of a sprint. You can then call a meeting and pat everyone’s back if all is good and move on. Right?
Not quite. “Good” and “Stable” have different meanings to different people, as they are subjective concepts, much like “Beautiful” and “Useful”.
In order to measure performance objectively, first of all you want to establish some metrics that are objective (as in: not subject to somebody’s interpretation, or “gut feel”) and measurable with numbers. Some good examples are:
- Number of bugs reported in the first week after release
- Number of hours spent on fixing bugs vs. new feature development
- Number of errors in every 1000 API invocations
You can track these metrics on a sprint-by-sprint basis and analyze how the team’s performance progresses over time. They are also a good conversation starter for an end-of-sprint meeting where the main goal is to learn from specific events that happened during the sprint, to avoid history repeating itself.
It is also useful to set goals for each metric, and to get the team’s buy-in to achieve these goals. This is what focuses the team on improving performance, and it’s a good topic of conversation during performance reviews.
An example goal would be to have less than 25% time spent fixing bugs in a sprint, which can be achieved through better quality practices (reducing the number of issues introduced every sprint).
Key Performance Indicators
If you sit down and think about all of your team’s activities, you can probably come up with more than a dozen of these metrics, some tied directly to each engineer (e.g. average number of hours per story point), some more team-wide (e.g. number of bugs per sprint).
This is good and helpful to identify issues within the team, but they do not show how the team is performing overall against the company’s strategic goals.
This is where the Key Performance Indicators (KPIs) come in.
KPIs are (high level) metrics that can be directly associated to these strategic goals and they serve mainly two purposes:
- UP: They give an idea to the CEO / Board / Exec team on how well the team is helping the company to achieve its objectives
- DOWN: They communicate the company’s objectives to the team in a way that is more relatable to their day-to-day work
Generally speaking, there are two approaches that you can use to come up with a set of KPIs:
- Single out: select the most important of your team’s metrics, where there is a good match between function being measured and company objective
- Roll up: combine several metrics into a single KPI
The single-out approach results in KPIs which are easier to produce but might not give the full picture of the team’s performance, whereas the roll-up one provides better “coverage” at the cost of a more complex scoring system.
Each KPI should have an interpretation key to help the reader unequivocally determine how well the team is doing. For instance, if you pick “Number of bugs reported in the first week after release” as a KPI, is 4 good or bad?
My suggestion is to build the KPIs so that they produce a score between 0 and 5, where 0 is “Bad” and 5 is “Excellent”, though this may vary in your organization as you might have guidelines from the rest of the exec team for a company-wide scorecard. For the “Number of bugs…” metric, a scoring system might look like this:
- 10 bugs or more >> 0 points
- 5–9 bugs >> 1 point
- 3–4 bugs >> 2 points
- 2 bugs >> 3 points
- 1 bug >> 4 points
- No bugs >> 5 points
Also, as for the metrics, it is important to define and communicate a target score for each KPI to help the team to achieve its performance goal. Maybe 3 points are a reasonable target in the example above.
Doing the work
It might take some time to come up with a set of metrics and related KPIs that accurately reflect the performance of your team, but it is a worthy endeavor as it gives you a list of actionable items that the team can focus on resolving.
My suggestion is to try a few different metrics and see what works for your specific scenario. I once defined 7 KPIs for one of the teams I lead, only to end up tracking 4 on a regular basis as these were really the “key” ones. The other three were stable with very little fluctuation, so they eventually got dropped.
In terms of execution, you might initially want to collate these KPIs manually. This works for smaller teams, but it quickly becomes time consuming and increasingly error-prone.
A better approach is to employ tools that make it easier to gather metrics, and define metrics based on the data available within such tools. For instance:
- If you use JIRA or Pivotal for task tracking, you can collect timestamps for state transitions to determine time spent on a specific story
- You can use the log of your Source Control system to determine how many commits were bug fixes vs. new features in a given sprint / branch
- If you have a Code Review system in place, you can easily determine the average number of revisions for each commit
Over the years, I have built a number of scripts (the latest revision using NodeJS) that would pull data from these various systems and convert them into a CSV. I would then import this CSV into a spreadsheet (Google Sheets, but Excel works as well) which uses pivot tables to collate the data and generate metrics and KPI reports (with some manual intervention, mainly around formatting).
For a team of 20/30 people, it would take me about 1 hour every week to generate these reports and their related analysis (going through the data and extract topics of interest for a discussion with my Dev Managers on one side, and the Exec Team on the other).
Performance management is not the most glamorous topic, therefore I am grateful that you stuck around till the end of this article.
I hope you enjoyed reading it, especially if it is the first time you setup a system to gather metrics or if you have just been asked to come up with a set of KPIs for your team.
Before we go, I would like to share a short list of example metrics that you might find useful as a starting point to define your own, categorized by function / aspect:
- Functional quality: number of bugs, bugs vs. features, bugs discovered during each phase of development (implementation, integration, QA, user acceptance test)
- Code quality: number of revisions per commit, number of refactoring stories per sprint
- Velocity: number of stories per day / sprint / project / engineer, number of points per day / sprint / project / engineer
- Documentation: number of amendments to specs during implementation (for lack of detail)
- Operations: planned downtime, unplanned downtime, release-induced downtime
As usual, feel free to share your experience and leave your thoughts below on the topic of performance measurement for software engineering teams and I will be more than happy to join the conversation.
Meanwhile, good luck on your journey and keep on making good things!