Monte Carlo simulation made easy with Lazycarlo

Walid Daboubi
Analytics Vidhya
Published in
9 min readMay 23, 2020

--

No one, especially from the scientific/enginnering community, doubts about how powerful is Monte Carlo random sampling method, particularly when it is about simulating an unknown phenomena or making experimental calculation when time/resources are in shortage.

Despite its high usefulness in a wide range of scientific and technical fields, this method keeps somehow mysterious and time consuming from implementation point of view. In most of the cases you need to be a Microsoft Excel expert or an experimented programmer to implement it the right way.

In this article, we will start by making a practical presentation of Monte Carlo method by going through two concrete Excel implemtnation examples: calculating two dices throwing outcomes and estimating cost/duration for a house construction project.

In the second part, we will present Lazycarlo a user interface based Monte Carlo engine by using it the simulate the already seen house construction project.

Monte Carlo simulation

Problem definition

Taking the time to well define the problem you want to study is a key step to implement a Monte Carlo simulation. A problem could be defined as a set of events that contribute to a unique final outcome. An intuitive example that we usually talk about when invoking Monte Carlo method is dices throwing.

Let’s suppose that we have two dices, and we want to experimentally calculate the probability of each of the possible outcomes when throwing both of them. Starting from the above, the problem in this case could be represented as the following two events:

  • Throwing Dice 1 - possible outcomes: [1, 2, 3, 4, 5, 6]
  • Throwing Dice 2 - possible outcomes: [1, 2, 3, 4, 5, 6]

Probability distribution

One of the most important steps when implementing the simulation is to choose the probability distribution of the phenomena/problem you want to simulate could follwow. For our two dices throwing example, the probability distribution is uniform for both events, which mean that every face of the dice has the same chance to happen, P=1/6 in this case.

The choice of the probability distribution could change according to the studied problem. The most common distributions are the following:

  • Uniform distribution
  • Normal distribution
  • Lognomral distrinution

In the case of dices throwing and knowing that it is following a uniform distribution, the outcomes could be calculated as the follwoing:

  • Event1_outcome = rand(1,6)
  • Event2_outcome = rand(1,6)
  • Problem_outcome = rand(1,6) + rand (1,6)

Which mean that for each of the dices we are taking a random value between 1 and 6 and summing up the result to get the problem outcome.

Simulation

Now that the problem is defined and the possible outcomes of each event are known, we can run the simulation. This step consists on simply calculating the outcome, we previously defined, a number of times (epochs) and getting the frequency of each value (2 to 12). This frequency represents the probability of the corresponding outcome.

Implementation in Microsfot Excel

Step1

We start by simply assigning a random value between 1 and 6 for each of the dices and summing them up.

Step2

In this step we will repeat step1 one a number of times (epochs), let’s say 1000 times. We start by adding a data series of 1000 entries.

  1. We click on the cell where the data series should start

2. We click on Fill -> Series

3. We choose Columns, 1 as Step value and 1000 as stop value

This will genearte a list of 1000 data entries starting from the cell we initially selected

4. We now can make each of the entries generating random number as we did in step1.

We first attribute the value of the two dices outcome we previous calculated in D6 as the follwoing

We then select all the 1000 both columns

And click on Data -> What if analysis -> Data table

We choose Column input cell and select any empty cell (will be used in background calculus)

Once done, all 1000 data entries will be filled with random value calculted as D6.

Step3

We now calculate the frequency of each outcome. This could be done using countif function as showed below. The Cumul line contains the cumulation of each of all precedent percentages for each value.

At this stage, we can say that we successfully implemented a Monte Carlo simulation. We can visualise the result, we get something like the following.

We can see that when throwing two dices, the most probable value is 7 with a probability of around 16%

When drawing the cumulative line, we can see that we have around 70% of probability to get an outcome of 8 or below, and around 92% to get a value of an outcome of 10 or less.

Project management with Monte Carlo

Now that we have a good practical understanding about how useful is Monte Carlo simulation and how to implement it in Excel, we can move to something more interesting. Let’s suppose that you are building a house and you want to estimate the costs and how long time you will have to wait until moving in. We will use a Monte Carlo Simulation to get the answer!

Problem definition

The house construction project could be divided into several steps. For each step we will need to defined a minimum and a maximum value for both cost and duration.

Here is a the list of the tasks and their minimum and maximum estimated values. These values will be the simulation inputs.

What Probability distribution

Since we have an upper bound and a lower bound for both of the metrics we want to simulate (cost and duration), and there is no sense to say that the probability of each of the value between LB and UB could be the same, we will use a normal distribution. We will assume that the probabiltiy is following the normal law, and based on this assumption we will use an inverse normal function to calculte values from probabilities. The parameters we will need to make this implementation are:

  • The mean of the nomal distribion: could be calulted using the formula mean = (max + min)/2
  • The standard deviation of the nomal distribution: could be calculted using the formlula standard_deviation = (max - min)/3.29

Base on this parameters, a cost/duration value could be calculated starting from a random probability, this could be done using Excel Norminv() function as the follwoing

=NORMINV(RAND(),mean,standard_deviation)

The RAND() here represent the current probability.

Let’s calculate to cost/duration of each tasks based on the above function. Here is the result.

Now that we can calculate the cost/duration of each task based on a random probability, and thus we can calculte the project cost/duration as a sum, we can move the the simulation.

Simulation

The same principle used for dice throwing could be applied here. We will make the same cost/duration calculus 1000 times using different random probabilities and get the frequency of each resut.

We obtain the following:

Cost frequencies

As the cost results are not discrete values, we will use ranges of 5000$. Knowing that the minum project is 145000$ and the maxiumum is 192000 we will caclulate the frequencies of the following ranges:

  • 145000 -> 150000
  • 150000 -> 155000
  • 190000 -> 195000

This could be done using Excel COUNTIF() fuction as the following

When applying the same calculs on the all defined 10 ranges we get the following simulation result

According to this result:

  • The most likely cost value is 170000$ (between 165000 and 170000)
  • According to the cumulative curve, there is 80.5% of probability that the cost will not exceed 175000$

Duration frequencies

For the duration we will just convert the simulation outcome into integers and proceed as for the dices. We obtain the following result:

According to simulation result:

  • The most likely duration is 46 days
  • According to the cumulative curve, there is 86.4% of probability that the duration will not exceed 48 days

Fun time - Automation with Lazycarlo

Yes, it’s fun time. As you may complain, it’s too time consuming to run a Monte Carlo simulation with Excel. Lazycarlo is a Monte Carlo engine developed in Ruby and could be used through a very simple user interface which only requires the project name and the metrics to simulate as inputs. Let’s try to make the same house construction simulation using Lazycarlo.

Step1 - Adding the project

We first need to login to Lazycarlo and click on new porject, this will lead us the the following view, where we fill the different inputs and click on create project.

Once the project create, we click on add new task to add the tasks we already defined when making the Excel implementation.

The simulation will keep updating as you add new tasks. Once all the tasks are added you will end up with the following view.

You will see and an initial simulation if you scroll down, you can run a new one by choosing the number of epochs and the probability distribution. The follwoing in the simulation result with 5000 epochs and normal as probability distrinution.

As you can notice the result are quite similar to the ones obtained with Excel!

Conclusion

Based on the treated two examples, we can see how useful is Monte Carlo method. It could be applied to solve the unsolvable and make life easier for a wide range of specialists. We can also notice that it is a bit tricky and long to implment it in Excel. A simple user interface Lazycarlo is there to help you make it more fun and less time consuming. Enjoy!

--

--