The Dark Horse Metric: A Case for Using Throughput

Published in

The Startup

6 min readSep 10, 2020

I’ve not come across many teams that use, let alone know that much about throughput. For the most part, velocity rules the roost when it comes to capacity planning, however, throughput can be just as useful (and you don’t need to estimate with story points… or at all… if that’s your thing.)

Velocity vs Throughput

Similar to how Fahrenheit and Celsius both measure temperature, throughput and velocity are different measures for a team’s capacity.

Velocity tells us capacity via the amount of completed effort (as estimated in story points). Throughput tells us capacity via the amount of completed tasks.

Both are used in pretty much the same way, e.g. during sprint planning, the team will look back at their average (or their average range) of completed tickets to get a feel for roughly how many they can pull into the next sprint.

The problem with velocity

The biggest issue I usually encounter with velocity, is that it is based off an arbitrary measurement system (story points) that has a tendency to cause confusion. This confusion generates debate and discussion around “frequently asked questions” such as “Shouldn’t we estimate spikes and bugs? Otherwise it’s work that won’t show in our velocity.” or “Shouldn’t we re-estimate the story points of unfinished stories? Otherwise it feels like the work we did doesn’t matter.” or “What do you mean a story point equates to effort? Effort means what?” While normally I would consider debate and discussion a good thing, we need to be mindful of the fact that dwelling on how to estimate and getting lost in velocity is time not spent delivering value to our customer.

When it comes to process we want to be as lean as possible. By simply tracking the completed tasks (regardless of type), throughput can eliminate confusion and complications that are so common with velocity and story pointing. That being said, throughput is also not perfect.

Using Throughput

Perhaps throughput’s main “quirk” is that while effort is inherently captured within the number of completed tasks, it is not explicit and this can be misleading.

Task count might be a predictable indicator in an environment where work items are somewhat homogenous or repetitive, like number of car doors fitted or turkey sandwiches packed, but when we are not certain about the effort involved in each individual task we cannot be certain about the overall effort that throughput represents.

Why is this a problem? Imagine if your team completed only 5 tasks in sprint A, and then 20 tasks in sprint B. This looks like quite a large variation in capacity on the face of things. However if you knew that the 5 tasks were actually all rather large, and the 20 were mostly small, then actually they might end up equating around the same amount of overall effort. How then are you supposed to know how many tickets to plan for?

When using throughput in a complex and therefore uncertain environment, it can be particularly helpful to try to minimize the variation of effort in your tasks.

The best way to do this is to continue estimating — but you don’t need to use story points. It can be t-shirt sizes, animals or any other relative scale. Estimating will help your team break up the larger work items that could otherwise increase your variance. However, this is not something that should be considered as ‘extra work’ just because you use throughput. I would recommend estimating regardless of whether you use velocity or throughput, not only for consistency but for reducing (known) risk.

Note: Estimating and breaking down work items is not something to get obsessed with. For the most part your team is likely to find a rhythm, i.e. a standard task size that it tends to break most stories into. But even then sizing is an estimation activity — you’re probably going to be wrong more than you’d like. You’re never going to be perfect in this, so don’t try too hard.

Remember that peaks and troughs are normal in complex environments. We just want to try to keep the variation between them down as much as is reasonably possible and focus instead on the average (or average range) that starts to emerge rather than individual sprint throughput (like that 5 or 20).

Unlike velocity that relies on estimating in story points, even if you don’t estimate at all, tracking throughput over a period of time will still reveal trends that can inform your team and stakeholders of your probably capacity (as we will see below).

If you’ve time, look at your coefficient of variance.

Without making this too complicated and time consuming— the coefficient of variance is essentially the average variance of data points around the mean. Very simply, the higher this number, the higher your variance, and the less ‘reliable’ your throughput is likely to be as an indicator for capacity. There is no ‘magic’ number here, however I very loosely would say that the higher your average and the higher your CoV the more likely a little investigation could be warranted. For example, if your average throughput is 9, and your coefficient of variance is 100%, that means that your average range is 0–18 (which you might consider as quite significant).

Examples of using throughput

Below I’ve shared a few graphs of how throughput can still be used for many of the same things you might traditionally use velocity and story points for. This data is taken from a real team that doesn’t estimate, yet as mentioned above, hopefully these visuals show that by tracking throughput we can still obtain valuable insights to start discussions.

A typical line graph showing the distribution of throughput over the course of 17 sprints reveals:

The emerging average is about 9 tickets, which can be used as a starting point for planning.
However their CoV, or average throughput range, is around 5–13 tickets, which probably could be tightened a bit.
We do notice a trend of increasing throughput, however whether this is due to a larger number of smaller tasks in the sprints, or simply more work/ effort being completed is something that would need to be investigated.

You can also go further and retrieve stats on how many unplanned vs planned tickets make up the throughput. In this graph we can clearly see scope creep in nearly every single sprint — 31% on average. This again might be something to look into.

Another graph that looks at the ratio of throughput (completed tickets) vs not completed tickets essentially shows the extent of over-commitment. This team is, on average, only completing about 45% of what they thought they would. This could be due to not understanding the effort involved and again might need to be checked out.

Throughput is helpful, but not the goal.

Because velocity is a product of story pointing and story point itself is frequently misunderstood — you might want to choose your battles. You can either try to invest more time in training and gaining experience in the use of story points, but as long as you feel you gave it a good go, there is no shame in cutting your losses and trying throughput. While not perfect, it can make your development process leaner by cutting out some unnecessary discussion and worry. However, take note that regardless of how you measure it, capacity is a measure of output not outcomes.