The two graphs above are common metrics used to measure the performance of a team over time. The one on the left indicating how many units of work were completed in a certain time period (Throughput) and the one on the right indicating the average time it took to complete one of these units of work (Cycle time). At first glance, I’m sure you’re all thinking that this team is performing poorly. However, in this article, I will show you how by using more granular data we can reach a totally different, and more accurate, conclusion about the performance of this team.
The problem with averages:
A mean average takes the sum of all data points in a range and divides it by the number of data points in that range. This is what we see being used in the cycle time graph. To be considered a continuously improving team you would want to see this average going down over time, so you’d be forgiven for assuming the team that produced the graph above is facing some challenges. However, the problem with averages is that they do not tell the whole story. This is because, when it comes to team performance, they hide a very important attribute, variability. Variance measures how far a set of data points spread away from their mean. In the software engineering world having a low variance is important for 2 reasons:
- Low variance means the team are very predictable. If you give them a piece of work, you can say with a higher level of confidence how long that piece of work will take to complete. This is very useful for estimations.
- A low variance makes it easy for teams to spot bottlenecks in their system, implement changes and measure whether these have been successful
Let’s demonstrate the problem with an example of two teams who report their metrics every 2 days. Below is the average cycle time graphs for the 2 teams.
At first glance, it appears team A is very gradually reducing its cycle time whereas Team B’s is increasing quite considerably and therefore it’s easy to conclude Team A are very predictable and gradually improving. However, now if we take a look at the variance, using a cycle time scatterplot, we get a very different story.
Now we see that Team A are very unpredictable. It would be really difficult for them to identify where the bottlenecks in their workflow lie and for them to estimate how long a piece of work would take to complete. Whereas Team B can clearly see they have just 3 outliers, they can then dig deeper into those 3 outliers to identify the bottleneck and fix it. They can also say with a much higher level of confidence how long it will take them to complete a unit of work.
More granular metrics:
In this section I will be showing some more granular metrics that can be used in place of averages. The tool being used is Actionable Agile which is a plugin for JIRA. The metrics shown are not unique to this tool, anyone can produce them. This tool just makes it easy to import your data from JIRA and quickly display the graphs with some nice features for digging a little bit deeper.
As you can see above, cycle time scatterplot graphs offer much more detail into a teams performance. You can learn more about how to use them here. Taking it back to the real example at the top of the page, let's now use them to dig into the issues that potentially face the team.
Now as you can see this is far from being the most predictable team in the world, however, things are not as bad as the average graph led us to believe. We can see that there is a convergence of dots between the 5–10 mark and we’ve got some clear outliers that we can investigate to figure out what went wrong. So let’s start with hypothesis one, and it’s a very easy assumption to make: higher story points = longer cycle time.
Now we see the same scatterplot with the dots coloured to represent story points and as we can see this hypothesis is very quickly disproved. There is clearly no correlation between story points and cycle time with 2s and 3s regularly having longer cycle times than 5.
On a quick aside this is an un-surprising revelation. Story points are useful if you follow SCRUM as once you have established a velocity it helps you plan how many tickets to bring into a sprint. However as a measurable metric they are useless because they don’t represent a real unit of time, they are a product of our feelings/imagination. They are useful in refining a ticket because they indicate how well understood the ticket is e.g. if everyone scores it a 3 the ticket is well understood whereas if the scores are 1, 3 and 8 then clearly the ticket is not understood by the team. But for reviewing performance and making estimations we should use real historical time data instead.
So let’s try a second hypothesis: blockages are causing the high cycle time outliers.
The red dots indicate units of work that have had flags put on them, this indicates a blockage. This is slightly more interesting, we can see that the majority of the high cycle time units have been blocked. This is where qualitative data becomes important, why have these tickets been blocked? In this example, it was due to waiting for another team to complete a task. So now we can generate our first action: What can we do to mitigate our dependency on this team? This is a great example of being a data-led team. Identify a problem using metrics, use qualitative data to explain why it is happening and generate an action off the back of it to help solve it. Admittedly this action is more of a question but once the team answer this question and implement some changes they can then come back to this graph in the future to see if the number of high cycle time units that have been blocked has reduced.
I think you’ll agree that blocked items don’t seem to be the total problem as there are still units of work with high cycle times that weren’t blocked and some that were blocked with low cycle times. So let’s try a third hypothesis: one of the workflow stages is a bottleneck.
Here is a cumulative flow diagram that shows us how units of work move through the workflow. You can read more about how to use them here. Immediately we can see where units are spending most of their time. The cycle time for “ready for dev” is 5.15 out of a system total of 9 so over half the time. Why is this? Again qualitative data tells us that there is a Work In Progress (WIP) limit on that column of 5 units, this means 4 units have to be completed before that unit can leave the stage. So we can now generate our second action: let’s reduce the WIP limit on the “ready for dev” column from 5 to 4. The team are just beginning this experiment in real life so hopefully, when they come to review this chart in a few weeks time we will see a reduction in cycle time for this column and therefore the system as a whole.
As I mentioned above variance can help us with two things, identifying and resolving bottlenecks to make the team predictable and then making estimations. We have covered some ways to identify bottlenecks now we will focus on making estimations.
I have added pace percentiles to the scatterplot. These tell us that 85% of work is completed in 13 days or less and this means when someone asks how long it will take to complete a piece of work we can say with an 85% probability that it will take less than 13 days. This then becomes the team’s Service Level Expectation (SLE) and they should hold themselves accountable to it. If they think 13 days is too high for their SLE then they should aim to reduce this number by using tactics to identify bottlenecks and eradicate them, similar to the ones I have mentioned above. Once we have an SLE, how do we hold ourselves accountable to it? By the time we come to review a cycle time scatterplot, it is too late to remedy any units of work that may miss it. This is where the ageing work in progress graph can be used. You can learn more about how to use these here.
This graph shows us the live progress of units of work as they move through the system. Any tickets that fall above the yellow band show that they are likely to miss the SLE and action should be taken to remedy this. By reviewing this graph in stand-up every day we can ensure all our tickets remain within the SLE by prioritising units that look like they are going to slip to ensure they don’t. Therefore over time the probability of us hitting this SLE will increase making the team even more predictable.
We’ve come a long way from our basic averages at the beginning of this article. As you can see it would have been easy for us to quickly jump to a conclusion that the team was struggling and needed to make large scale changes to improve their performance. We also didn’t really have any data points to help us understand what the issues were. However, by using the more granular metrics shown on the cycle time scatterplot, cumulative flow diagram and ageing work in progress graph, combined with qualitative data to explain why the graphs showed us what they did, we managed to identify exactly what the problems were, come up with solutions and have a way to measure the results of those solutions in a very precise manner. Not only that but we also learnt how we can use these graphs to estimate work and hold ourselves accountable to these estimations. Overall this will lead the team to become very predictable and highly performant.