How you can forecast the future in software development

Thijs Morlion
In The Pocket Insights
4 min readMay 10, 2022

Software development is a tricky business. Especially when you need to make forecasts. In a previous post, you learned how to make more precise predictions for single work items. But what if you want to give that estimation for a larger chunk of work consisting of many items? Let’s tackle that question!

You already know your way around several metrics by now. By measuring cycle time, you spot trends in your historical data. The work item age might suggest blocked or difficult work items. To predict the future for a bunch of items, we’ll use yet another metric: throughput. As work progresses through your pipeline, it flows to the ‘done’ state. The definition of throughput is the number of completed work items per unit of time. If you display the throughput per day over a certain period of time, you get a throughput histogram.

A throughput histogram

Some days, your team completes a lot of work items. Other days, you finish none. Let’s put this data to work to predict the future. If you know how many items the desired feature has, you can use this data to create probabilities. For the sake of simplicity, we have a feature that consists of 100 items. Your client asks when it will be delivered. You already know that it is never a good idea to give an exact date nor is it to communicate in probabilities of 100%. A good forecast always contains two things: a range and a probability. You communicate to your client that there is a 85% chance you deliver the items by a certain date. The 85% being the probability. The date itself, the range.

To start forecasting, you take the number of items you want a forecast about. Pick a starting date. Then select a random value from your throughput histogram. Subtract the throughput from that number of items and add an extra day to your starting date. Repeat this process until you have no work items left. Plot that day on a new histogram. The idea here is that you repeat this whole process a couple of thousand times. The resulting histogram will look a lot like this:

Monte Carlo estimations can help you forecasting the future

On the x-axis you’ll see the various completion dates of the 100 work items. The y-axis displays the occurrence of each date in the histogram. On the graph, you can also spot the percentiles. For instance, the red box indicates the 85th percentile. This means that 85% of all the occurrences, the yellow area, happen before or on the 24th of June. That means there’s a probability of 85% that we will complete those 100 work items on or before the 24th of June. And all of this is based on historical data. Data which we collect by completing work items.

Important to mention is the fact that you need to repeat these forecasts over and over again. Both the scope (number of work items) and the historical data (throughput) will change. As soon as you have a new context to work in, your previous forecasts aren’t valid anymore. These kinds of simulations are called Monte Carlo simulations. You use them to predict the probability of certain outcomes in which random variables are present. To achieve this, the algorithm relies on random sampling of historical data. Exactly what we did above.

Selecting the correct historical data as input is more of an art than it is a science. Sometimes, people go on vacation or fall ill. From that point on, that part of the historical data might not be accurate anymore to predict the future. In any case, don’t overthink this too much. You still need to interpret what you see. Use it as a discussion tool to uncover potential underlying challenges. Challenges such as scope creep. You will find it to be a very powerful way to see the wood for the trees. Especially in the uncertain context of software development.

--

--

Thijs Morlion
In The Pocket Insights

Making impact by supporting people, teams and organisations to become the best possible version of themselves!