Calculating Velocity from Story Points is a Category Error

A category error is when we ascribe properties to a thing that can’t possibly have them. e.g. your breath smells a funny colour, the number 9 is slippery and sly etc.

The poor old story point has been hurt and abused since its invention. In Agile we understand them as a means of abstracting relative size about a user story or task that we are considering, a move on from estimating task time in hours (since we know that the actual time taken may vary due to a whole range of factors which may have nothing to do with the task in question). The idea is simple — as a team we form an understanding of the smallest unit of work we might be asked to do (and track as a work item), call that 1 point, and then from there use it as a comparison against other things we’re asked to do.

Poker face

Many of us have participated in the Scrum Planning Poker game (I have several packs of cards for this) where the team sits down, goes through the backlog (hopefully just the items coming up for a sprint) and assigns a point value to each item. The idea is that having a pack of cards containing number combinations of e.g. 0, 1, 2, 3, 5, 8, 13, 27 whatever to hand out, means that each individual’s answer is revealed simultaneously rather than going round the room and be subject to social biases e.g. deferring to what the most senior/experienced/loud/most highly paid person says. Although the subsequent discussion can often revert to this.

If there is a significant difference between the highest and lowest point scores, then the meeting has great value in discussing the reason for the difference. In the beginning it is about coming to a shared understanding of what the points represent, but over time allows folks to uncover differences in what they think is involved in completing the task.

It is this purpose which means the meeting is useful — a proxy or instrument to get the team to talk about the task and iron out differences in approach, dependencies, complexity etc. ScrumMasters then take the sum of the points for the tasks in a sprint and use that number as the team’s velocity. Once the team have come to a common understanding of a point (it can take about 3 iterations or more) then you ought to see a stable team increase in velocity (i.e. their capacity to get through work in a defined time period).

Show me the numbers

Beyond the ScrumMaster there is also a risk of consequent managerialising of the points. I am as guilty of this as anyone. When I was a Development Manager I was keen to track the total number of points per sprint per team on a nice spreadsheet, and show to the senior management team splitting out averages over time, points not completed per sprint, points per product area etc. Management loves numbers, and they love seeing them improve over time. I even had the debate as to why one team’s points were so much higher than another’s (i.e. what one team gave a 2 they would likely call it 8) and what we could do about bringing them into alignment.

There is one small flaw with the story point approach. It’s cobblers.

The reason is this: story points are ordinal not cardinal numbers. They are not exact representations of levels of effort, they are relative to each other to identify which is bigger and smaller than another thing within the bounds of our consideration. What you cannot say is that the time take to do a 2 and a 3 point story is equivalent to doing one 5 point story. The numbers we assign to stories are labels in order that indicate that a 5 is bigger than a 3 and much smaller than a 27. We could have easily have given them A, B, C etc as labels (and indeed the T-Shirt sizing approach does this).

Check out Ben Northrop’s blog on this subject. There is an example using equivalent median hours spent per point level to illustrate but my argument that story points are ordinal is not dependent on hours != points.

So while there is a relationship between point size and capacity/velocity — i.e. a bunch of high point value stories will take longer than a bunch of low point value ones — it is not a mathematical one that bears any significant analysis or aggregation.

Story points do not have the property of being able to have arithmetic done on them, therefore to add them together to calculate velocity is a category error.

What do we do instead?

If you must count capacity (I personally prefer flow-based metrics and consideration of cycle time instead) then at least be more empirical. Once you have a stable team and a rough understanding of what a “typical” user story might look like, try to ensure your stories from then on are of an approximately similar size. Then count the number of stories your team gets through per sprint/release. You can then work out what you are likely to get done in a future sprint based on the data of previous ones.

As Deming said, “It is totally impossible for anybody or any group to perform outside a stable system … if a system is unstable, anything can happen.” The complexity and changes being undergone do not lend themselves to typical managerial scrutiny. Instead look at outcomes until the team (system) is stable, and then you can start tracking metrics more closely. Ones that actually mean something.

Like what you read? Give James Hull a round of applause.

From a quick cheer to a standing ovation, clap to show how much you enjoyed this story.