Are Story Points Helping Your Team?

David Johnson
The Pragmatic Agilists
6 min readJust now

Story Points have been around for a couple decades now. There are countless articles and videos demonstrating their usage. Many organizations mandate their usage and teaching them are part of the standard approach for consulting organizations.

For as long as I’ve known about Story Points (SP) and the associated velocity there continue to be far too many teams that have constant incomplete work Sprint after Sprint. Teams spend endless hours haggling over point estimation and still get it wrong so much of the time. There is elaborate theater in many teams at the end of the Sprint, renegotiating SP estimates for work that was harder than anticipated. Or splitting incomplete work and then attempting to determine where to place the Story Points.

Yet never do teams reestimate down work that was easier than expected, funny that.

So why this article? The vast majority of the existing literature falls short in evaluating how useful and effective SP and velocity are within your teams.

Lets dig in and learn how to evaluate the usefulness of story points using your team’s data.

The SP Theory

The original intent of SP was to obfuscate time. Their usage hid that developers were estimating in Ideal Days (which never really happened then, nor do they often happen today). They were additive, such that a 3 point story and a 5 point story should be about the same effort as an 8 point story.

Not so these days. SP are now used to estimate relative size but they are not expected to be additive. They’re watered down in that respect.

Elaborate techniques are now common, such as Planning Poker. While the conversation they promote is very useful to create a shared understanding of the work, such techniques are not necessary for a team to discuss upcoming efforts.

An Experiment

Export your team’s data from your ALM (Agile Lifecycle Management) tool for all items completed in the last 90 to 120 days. In the export include the date the work started, the date the work completed and the SP estimate.

In Lean terms, you’re looking to calculate the Cycle Time (date work completed — date work started + 1) using the dates. Cycle time per item and its SP estimate are the only things you need but if the team is also using different work item types or categories of work that can also be useful.

Use a tool like Excel or Numbers to sort the data by SP estimate, low to high. You should wind up with a grouping for every SP value used by the team. A group of 1s, 2s, 3s, 5s, 8s and whatever else they use.

Now calculate the average cycle time for each group of SP values and create a chart showing each SP value and its associated Average Cycle Time.

The chart should look like this.

Interpreting the Results

If you are using SPs as an estimate of relative sizing you would expect the average cycle time per SP value to reflect that relationship between SP values.

But I bet it doesn’t. At least not to the degree you thought should be happening.

Here is an example of what you are likely seeing in your data. Your average cycle time values will be different but for most teams there is little separation between them at the various SP values. Certainly not the relative differences you would expect.

Examine the Outliers

Look closer at your data in the various SP groups. Are the largest variations in cycle time for the smaller SP values?

The majority of the teams I have worked with who did this exercise found that the biggest surprises were for the smaller SP estimates. Slam dunks that weren’t. Rarely were teams consistently way off on the large estimates, they knew they were big and discussed them and/or broke them down smaller before starting the work.

Regardless of the technique you use for estimating or counting, gaining a team level shared understanding of each work item is essential. Make sure the team doesn’t skip over discussing the smallest items because they are “so small nothing could go wrong”.

“It ain’t what you don’t know that gets you into trouble. It’s what you know for sure that just ain’t so.“

Mark Twain

Avoid the Bottomless Pit

A natural reaction when seeing these results is to conclude that teams need more training in estimating. That they don’t understand the proper usage of Story Points.

We’ve been conditioned, over many years, to think this way. Estimating is the expected, accepted method to create a schedule and any Project Manager can teach you to estimate and create that schedule.

Don’t fall into this trap. It’s the same trap it was before Story Points, the only difference is the unit (SP vs hours). Trying the get better at estimating is the same bottomless pit. It’s a red herring, an ineffective way to proceed in this situation with uncertainty, unknowns and dependencies.

Getting better at estimating means spending more time and effort before starting the work, when little is known about the actual work necessary. You do not know ahead of time which assumptions and decisions about the work will prove to be incorrect.

In order to offset the effects of uncertainty and unknowns teams will pad their estimates. Sometimes a little, more often a lot in an environment with high uncertainty, high WIP (Work-In-Progress) and dependencies.

So rather than an actual estimate you are getting a risk hedge. The thinking goes “This would likely take me 60–80 hours with no interruptions. But I know I will get at least 5 other things assigned to me, I have these other 3 things I still need to finish, I’m not exactly sure how to code this and that other team needs to do this one thing before my work can finish. So I’ll estimate 200 hours.”

I know this happens because it’s what I and my teams did as developers for decades. My approach was something like double it and add 20 (hours, days or percent depending on the situation).

I’m sure you know this happens too.

A Better Approach

My long answer is here: Count Stories, Not Points

For the purpose of loading a Sprint using the average throughput should be fine. Calculate the average over the last 5 or 6 Sprints and plan that many work items for the next Sprint.

Understand that there will be variation in the actual results Sprint over Sprint. Average throughput, and average velocity, are not great predictors of the result of any single Sprint. No average is. Average throughput should be though of as a range not a single value.

But for the purpose of loading a Sprint it’s OK to be off by a work item or two. It does average out over a few Sprints. Don’t stress over it, keep it as lightweight as possible.

Hopefully using the information in this article you can now better assess the team’s usage of Story Points. If your team is like most teams I’ve ever encountered you will find that the team is spending a lot of time on something that is not providing much benefit.

If you found this article helpful, please click on 👏 and follow me for more valuable information on Agile, Lean and Book Reviews.

Until next time!

--

--

David Johnson
The Pragmatic Agilists

Dave is an Agile Coach with nearly 40 years experience developing software and helping teams & organizations improve their value delivery.