Shake It Up — A First Attempt At Building a Monte Carlo Simulator On Google Sheets

In the “Shake It Up” series I am taking my readers with me on a journey towards understanding Monte Carlo simulations, and then how I turned spreadsheets into forecasting tools.

Matthew Croker
8 min readJun 5, 2023

If you want a quick summary on Monte Carlo simulators, check my first article of this series Shake It Up!

TL;DR: Implementing a Monte Carlo simulator for project forecasting can be done with popular spreadsheet tools. In this article you can see my approach.

Let’s Start!

It seems like, at some point, all my journeys lead to Google Sheets 😆. The software’s popularity and availability universally across so many platforms always attracts me in my quest of building shareable tools to support fellow Agile coaches, consultants, Product Owners, and everyone else in between.

troy.magennis has a forecasting tool available online that allows users to work project forecasts based on estimates, historical data and some variables. True to his style, the tool is simple and a great way how to kick-start a discussion on how to plan for success.

My motivation, at this point, is to learn from this spreadsheet and try to replicate it using Google Sheets. By doing so, I will increase my knowledge about Monte Carlo simulations and possibly look at further enhancements and integrations in the future.

Here are the aspects of Troy Magennis’ tool I like best.

  1. I like the idea of date ranges according to the probability. Forecasting is not only about generating a tentative date, but most especially about the challenge of communicating this tentative date.
    This leads us to two important facts of life relevant to the forecasting topic: a) that projects come with dates, whether shared or not, and b) that there is no way how to stop time. Joining these two facts together, and showing them in a way enables discussions on options is important for its educational impact on those viewing the forecast.
  2. I also like the idea having modifiable variables that influence the outcome of the forecast. This is a necessary coaching feature, whereby a forecast does not come as good or bad news (the typical RAG status updates, where a project is reported as Green for months and then switches to Red a week before its deadline). On the contrary, a forecast is presented to trigger discussions and consider options for success.

The following question stuck to my mind:

How does Google Sheets allow me to build a Monte Carlo simulator for project burndown forecasting?

Clear Thoughts — Random Behavior

To build a Monte Carlo simulator I need one essential feature: the ability to generate random numbers within a given range.

For this, Google Sheets has RAND() and RANDBETWEEN() . The two built-in functions have different applications, but in terms of generating random behavior, we can confirm that Google Sheets can be used.

For my first simulator, the user flow I had in mind was as follows:

  1. The user provides a set of historical (weekly) throughput data and the backlog size
  2. Google Sheets generates a number of simulations, hypothetically burning against the backlog size
  3. Every time the burndown reaches zero items left, the number of weeks are recorded
  4. A percentile on the number of weeks is taken, and translated into a date

My First Simulator

Now that the idea has been laid out, in the following section I will share how I implemented my first simulator. It is good to point out that I have restricted myself to using default and basic formulas in this first implementation so as to remove any unnecessary complications in the explanation of my approach.

I envisioned the simulator as having 2 sheets to keep the simulator user friendly:

  1. Sheet 1 would be where the user would insert their settings, i.e. Backlog Size and Historical Data, which we will call the Settings Sheet.
  2. Sheet 2 would have the forecasting process and summary of data, let’s call it the Forecaster Sheet.

The Settings Sheet

The Settings Sheet is meant to be a very simple sheet in which the main focus is to make it clear for the user where they should enter their data.

In this forecaster I will be requiring two pieces of information:

  1. The original backlog size
  2. A history of throughput data

The backlog size can be just a labeled cell, and perhaps hooked to a Named Range to make the formulae readable across the whole spreadsheet. In this case, I decided to name the cell Backlog_Size (how original).

When randomly generating throughput based on historical data it is crucial to reflect not only the numerical values but also their distribution as it happened in history. Let’s say a team has a historical data of 1,2,1,4,2,1,1,5 . The generation of random throughput would not be fairly representing this team’s pattern if we had to use RANDBETWEEN(1,5) , as this would give equal probability to all values in the range. A solution to this is to treat the table as a numbered list, and then randomly generate the index to that list.

The Forecaster Sheet

The Forecaster Sheet would need to do most of the heavy lifting. At first I thought of having two columns, as shown in the image below.

In the first column, Remaining Backlog, I would have the burn down. Starting from the original Backlog Size, the column would keep subtracting the outcome of the random throughput until the backlog size hits zero or a negative number. At that point, the column would start all over again from the backlog size, signifying a complete simulation.

Achieving this required me to mostly apply basic arithmetic, which is ideal for the aim of this first simulator. At this point, however, I realized I might need an additional column to keep track of the outcomes.

The approach was once again a simple one, I just increment the value from the row above once the backlog refreshes.

With all of that in place, the remaining bit was to summarize the insights generated from the simulations.

First, I used the outcome number to understand:

a) how many simulations did the sheet generate

b) the duration of every simulation

…and populated everything in a table.

To finalize the simulator, I need to do the last two things based on the summary table:

  1. Calculate percentiles — What are the 50%, 85% and 95% of the list of simulations?
  2. Draw a probabilistic distribution — How often did an outcome (duration in weeks) feature in the list of randomly generated burn downs? For example, how often did the simulator return a permutation of 16 weeks? The answer is a list of tallies, from which we can also plot a histogram.

An Extra Step

Up until this point we can already claim that we have a Monte Carlo simulator for forecasting a project or feature burn down. As I looking at the outcomes, however, I realized that every so often the summary was showing an abnormally short burn down. While the minimum, usually, was around 11–12 weeks, occasionally I was observing something like a 3 or 5 week burn down.

The reason behind this was that the last simulation was often being cut short, i.e. the sheet was running out of rows, and therefore the summary was capturing a new — very short (incomplete) — iteration.

To fix this I decided to add a label every time a burn down was concluded. Basically, if the backlog value of the next row is equal to the original backlog size, then this iteration can be deemed complete.

When summarizing, then, I filtered out those outcome numbers that did not have a the “Burn Down Complete” label.

Pros and Cons and Conclusion

Once I pictured the mechanics, building this first Monte Carlo simulator in a Google Spreadsheet was very simple. Additionally, I felt like I deepened my knowledge of Monte Carlo simulators as I was putting it together, which I consider quite important for my job as a Team Process and Data Coach.

On the downside I could notice the following points:

  1. Limited simulations — The artifact I built could generate a very limited number of simulations. Just to give an idea, for a backlog size of 105 I needed 1000 rows to generate roughly 60 simulations. The power of Monte Carlo simulators comes in strong when hundreds or thousands of simulations are generated, as it is then that the true patterns of the system emerge.
  2. Unrealistic burn down — Despite basing the burn down simulations on throughput historical data, project or feature burn downs are not only impacted by throughput but also by other factors like scope creeps and bugs. This was one thing which I really liked from Troy Magennis’ spreadsheet, but left out of this equation for the time being to focus on the basic mechanics.
  3. Performance — I took the most simple approach in implementing this artifact, and this also means that I used cell-by-cell functions rather than more performant (and easier to debug) array functions like LAMBDA() , MAP() , REDUCE() , and ARRAYFORMULA (I used this last one only once…).

In the next article I will aim to reproduce a more realistic burn down simulation by introducing other variables, while also improving on performance and more simulations.

Matthew Croker is a Team Process and Data Coach, specialized in the software development industry. Through his work he helps teams within companies focus on learning how to work best together by analyzing their data, simplifying the setup of processes and creating habits that boost their productivity.

--

--

Matthew Croker

Team Process & Data Coach | Co-Creator of Decision Espresso | Creator of Story Ristretto