Shake It Up — Turbo Charging Spreadsheets for Simulations

In the “Shake It Up” series I am taking my readers with me on a journey towards understanding Monte Carlo simulations and how I turned spreadsheets into forecasting tools.

7 min readJun 17, 2023

TL;DR Monte Carlo Simulation is strongest when it generates volumes of permutations. We are talking hundreds or thousands of outcomes. In this article I will:

Introduce how RANDARRAY and REDUCE functions can help build a scalable solution
Share my idea of turbo charging the simulator
Briefly share my thoughts on introducing interruptions to the new structure

Previously in the “Shake It Up” series

The experiment in building a Monte Carlo simulation throughput forecaster has helped us to appreciate the role of having different variables in shaping up a realistic simulation of a future event. We have tried to simulate the behavior of technical debt and scope creeps as variables that influence the project forecast. Check out the previous article if you missed it.

The next frontier was a crucial factor for a respectable Monte Carlo simulator: number of repetitions. The simulators I have managed to build so far in this series were good to understand the underlying concepts of a Monte Carlo simulator. However, they came short when generating number of outcomes. While Google Sheets allows the addition of more rows, I restricted myself to 1000 rows on purpose to find ways how to maximize the use of a spreadsheet.

With 1000 rows for a 100 work item backlog and low level of interruptions, the simulator built in the last article managed around 50 permutations. The tool by troy.magennis we had referred to earlier in this series generates 500 simulations. I would look at 500 simulations as a minimum for my next simulator.

How can I upgrade my simulator to generate 500+ simulations?

At My Signal, Unleash Random Generation

To move towards that goal without extending the number of rows in the spreadsheet I realized I needed to change my approach of generating simulations. The following was my plan:

Generate in bulk — Rather than generating throughput values as I simulate the burndown, I generate a bulk of permutations and then think about consuming them.
Maximise utilization — Once the bulk was generated, my assumption was that I might have redundant (though still valid) figures. My focus needed to turn towards leveraging as many randomly generated numbers as possible and collect them as valid simulations.

Here I introduce two new Google Sheets formulae that helped me achieve this goal:

RANDARRAY
REDUCE

RANDARRAY

In my previous articles I was using a mix of theRANDOM() and RANDBETWEEN() functions to generate my random throughput. Doing so for all permutations, however, required me to have a function in each and every cell featuring a random number. Generating in bulk required me to be more efficient.

My approach in V1 and V2 of the Monte Carlo Throughput Forecaster

Enter RANDARRAY .

TheRANDARRAY function allows you to generate a matrix of random numbers. When paired with theARRAYFORMULA feature, the generated numbers can also be manipulated in bulk (for example, multiplying the whole matrix by a given factor). I required a way to generate in bulk, scale and descale as required while building and testing without too much configuration, and RANDARRAY fit my needs like a glove.

I used the matrix generated by RANDARRAY to reference the throughput data, as I was doing in earlier versions of my simulator. Eventually, I could even look into further modifying the matrix to introduce other variables to the simulation.

In the 3rd version, I am generating a matrix of TP values, and use it later

REDUCE

As I was experimenting with Google Sheets functions I discovered array and lambda functions in its default set of spreadsheet formulas. For those of us with a background in programming, this is good news. For the rest, here’s a little background.

Array formulas are ones that allow you to manipulate a whole data set based on simple rules. The good oldFILTER formula has been provided as a default function in all popular spreadsheet solutions, and is a good example of how array formulas work.

To understand what lambda functions are, I found it most effective to think about data as a set, not as individual units. Then ask the question:

How do I want to transform this data set?

Do I need to keep some and not all based on some criteria? …then you need to FILTER
Do I need to keep the set, and transform it? …then you need toMAP
Do I need to summarize the set? ...then you need toREDUCE

The lambda is the action I would want to perform on the data expressed mathematically:

When filtering, the lambda is the filtering criteria (e.g. A2:A1000 >= 10 )
When mapping, the lambda is the transformation of the data (e.g. MAP(A2:A1000,LAMBDA(x, x*2)) returns all values in the range multiplied by 2)
When reducing, the lambda is the act of combining and reducing the data (e.g. REDUCE(A2:A1000, LAMBDA(accumulator,x, CONCAT(accumulator,x))) concatenates all the values in the range into a single string)

Data sets often need a chain of these interventions to reach an end goal. This was the case for my simulator.

In its essence, a backlog burn down is a reduction of an array of throughput values, subtracting continuously the initial backlog size figure until it reaches zero or negative (hence I definitely needed REDUCE ).

This reduction, however, needed to stop as soon as I reached zero or a negative number. My solution was to keep memory of the burn-down inside a string which I was passing down as an accumulator in the reduce function:

Turbo Charging

As expected, I soon realized that my simulations were making use of a fraction of the randomly generated throughput numbers in the matrix. My matrices were 500 x 100 values in size (500 rows, of 100 cells). When a burn down consumes 8 of those cells (i.e. a burn down of 8 weeks), the remaining 92 values were going to waste. Even when the backlog size was considerably large, for example 150 items, the burn down was still leaving redundant throughput values.

To solve this, I thought of creating a system similar to how turbo charged combustion engines work. Turbo engines make use of the exhaust steam to generate further energy. They are practically making a second use of something which before the advent of turbo engines was considered as waste.

I basically repeated a version of that (scary 😱) formula shared earlier. By doing so I suddenly unlocked potential to not only double, but at times triple the number of simulations without generating further lines.

The spreadsheet would seek to generate a maximum of 3 burn-downs out of the same string of randomly generated numbers, let’s call them 1st Pass, 2nd Pass, 3rd Pass. The simulator would validate the three passes based on a sample. In case any of the passes results as invalid, i.e. that it would not be burning the project to the ground, then the whole pass would be deemed invalid.

This way my simulator would be generating at least 500 simulations, with the possibility of even elevating that to 1500! 🙌

Adding Interruptions

Introducing added interruptions to this format proved to be a bit harder than expected.

Adding all the interruptions started to make my spreadsheet go crazy (remember the Christmas tree in the previous article? 🎄 — very similar). For the sake of this article, therefore, I decided to limit myself to a minimum working simulator. In my case, I manged to integrate the Bugs variability without any problem, as this was a simple multiplier to the generated throughput value.

Lessons Learned

At this point I believe I can start summarizing the lessons learned on building a Monte Carlo simulator for project forecasting using Google Sheets.

Lesson #1 — It can be done. The idea of being able to build a model and simulate a future event sounds daunting. By getting my hands dirty and leveraging default spreadsheet functions I have demonstrated to you that your first steps in this fascinating world is not that scary after all.

Lesson #2 — Modelling behavior. A Monte Carlo simulator’s success is intrinsically tied to the quality of the data it is fed to shake up. Initially I relied heavily on historical data, but then I realized that this might not always be possible to do and, instead, I started venturing into the world of mathematical functions. This opened up a new horizon, whereby the focus shifts on the behavior of the system rather than on a particular data set.

Lesson #3 — The importance of quantity. In Monte Carlo simulations, the gains and patterns are seen in the long run, and the longer the run the clearer these might come to the surface. No matter how restricted your resources might be (case in point, I stuck to spreadsheets and spreadsheet functions!) when building a simulator quantity will definitely help.

These three lessons apply to any Monte Carlo simulation that I would be interested in building using a spreadsheet. In my next and last article in this series I will share my approach in building simulators for different applications, other than project burn-down.

Matthew Croker is a Team Process and Data Coach, specialized in the software development industry. Through his work he helps teams within companies focus on learning how to work best together by analyzing their data, simplifying the setup of processes and creating habits that boost their productivity.