EXPEDIA GROUP TECHNOLOGY — SOFTWARE

Monte Carlo Forecasting in Software Delivery

Bringing science into the art of software forecasts

Bart Masters

Published in

Expedia Group Technology

6 min readMay 26, 2020

Casino chips on a roulette betting table — Software forecasting shouldn’t be a gamble. Photo by Kay on Unsplash

“When is it going to be ready?” We’re often asked that at Expedia Group™, and one way we answer it is scientifically — with Monte Carlo forecasting.

What’s wrong with just using velocity?

One common approach when using Scrum is to use the team’s average per-sprint velocity. Take that number, divide it into the product backlog, and that gives you a number of sprints required to complete the backlog. As a basic approach, that works well enough.

However, it is crude — it assumes your velocity and backlog sizes will remain consistent, and it provides a hard end date when your work will be finished. Velocity often changes from sprint to sprint, and the backlog will often change in size. And the chance of hitting a specific date months in the future is negligible. We need an approach that reflects the unpredictability of the future.

Enter Monte Carlo forecasting

The Monte Carlo method is a branch of mathematics invented by nuclear weapons researchers at Los Alamos. It is an approach for taking an input of source data and performing multiple random samples to come out with probability-based results. Wikipedia has a good writeup of its history.

It becomes useful in software by taking a range of historical data (completed story points or user stories per sprint) and a range of outstanding work (a product backlog). It can then provide a probability-based forecast on the date range when the work will be complete.

Monte Carlo run-through and attribution

I learnt this technique from reading the works of Focused Objective and their founder Troy Magennis. If you find this interesting, I strongly recommend investigating their work.

One tool created by Focused Objective is this Excel spreadsheet, which is a great tool for running Monte Carlo forecasts. I use it extensively here at Expedia Group, and it is the core of this demo.

Step 1: Collect historic data

The first step in any data-driven approach is to get your data — your team’s historic throughput. In the example below, I’m forecasting for a team who are on 2-weekly sprints and have most of their stories estimated using story points.

I start by collecting completed story point stats for the past seven sprints and enter them into the spreadsheet on the Throughput Samples tab. That's the 11, 9, 19, and so on in the orange cells.

If you don’t use story points or sprints, you can still use Monte Carlo. You could use number of stories per month, number of JIRA tickets per week, or whatever you have as a regular cadence.

Inputting historic data into the spreadsheet

Step 2: Input your outstanding work

The main Forecast tab is where you key in the scenario you want to forecast.

In this scenario, I’m doing some planning for the team for the next quarter. So I plug in the scenario data:

Start Date — March 13th.
Low and high range for the number of story points in your scenario. In this scenario, we are estimating 80–90 story points need to be worked on. This is an excellent spot for capturing any uncertainty you have with the amount of outstanding work.
Low and high range for how many stories are created or split during the work. In this example, I’m being pessimistic, and estimate that when the team comes to plan their sprints, in some cases they are going to have to double the amount of stories/story points. So we have an effective range of 80–180 story points for this scenario.
Length of delivery cadence, in this case, 2 weeks.
The last 2 orange squares are where you can put in guesses on your velocity, if you don’t have any historic data. I’ve found Monte Carlo not particularly useful without real data, so my recommendation is to hold off until you have at least 2–3 samples of useful data before trying any forecast.

Step 3: Forecast!

And here is the forecast for our example scenario:

The spreadsheet has run 500 times through a Monte Carlo simulation (more details on what this looks like next) to forecast how long it will take to complete the work in the scenario. Out of the 500 simulations, the work was completed by April 29th 5 times. 5 out of 500 is not great odds of the work being completed by that date.

The simulation had the work completing the week of May 13th 30 times. That means 30 + 5 = 35 out of 500 simulations, or 7% chance of the work being complete by May 13th. Still not great odds.

And the simulation keeps ongoing, you can see it gives you 50/50 odds of being complete on or before June 24th. 85% chance of completing by August 5th, and if things go pretty badly wrong, won’t be ready until September 30th.

So now you have a probability range about when the work will be complete. Do you want to take a gamble on the work being complete by June? Or take the safe route and plan for August?

Where did this forecast come from?

This is the heart of Monte Carlo forecasting. The spreadsheet takes the data you have entered and;

Randomly selects how many stories you need to build, based on the low and high estimate you gave it (in the example above, the range is 80–90).
Multiplies that by a random value within the range of story splitting you gave it (in the example, 1.0–2.0), which determines how many story points are needed to complete the scenario.
It then randomly picks one of the historic throughput samples you entered and uses that to burn down the required work for a sprint.
Then it selects another random throughput sample to burn down the required work for another sprint.
And so on, until the # of story points = 0. That determines how many sprints were required in this randomly-generated scenario, and thus a delivery date. One forecast is complete.
Repeat the above 499 times.
Group up the results, and you have your Monte Carlo forecast. 500 different randomly-generated simulations of your forecast scenario, grouped together to show the highest probability of completion dates.

What's next?

That is up to you. Based on your inputs, the forecast has shown a range of various probabilities for completion. You can now have informed discussions with your stakeholders on their appetite for risk for committing to various release dates, or techniques like backlog refinement to tighten the amount of outstanding work.

I hope this been a useful intro to Monte Carlo forecasting — as before, I owe Troy Magennis a great debt for introducing me to this technique — focused objective has a lot more interesting tools and techniques for data-driven forecasting.

And if you have any other tips or tricks — please let me know below. Happy forecasting!