Monte Carlo Simulation in Python

6 min readApr 5, 2023

--

Pi simulated by Monte Carlo simulation. Source: https://demonstrations.wolfram.com/ApproximatingPiByTheMonteCarloMethod/

Introduction

We talked about resampling methodologies (bootstrap, cross-validation 1/2/3 and permutation test) in previous posts. Since resampling is a special type of Monte Carlo simulation, it’s now time to zoom out from resampling to Monte Carlo simulation to look at the big picture.

Monte Carlo simulation

Monte Carlo simulation samples from either known or assumed probability distributions. The choices of the probability distributions rely on domain knowledge.

Use cases

Simulate potential stock prices to help users understand whether their portfolio will be enough to provide the level of living they want after retirement.

Simulating stock prices. Source: https://www.interviewqs.com/blog/intro-monte-carlo

Simulate potential financial losses a bank is willing to stomach so that they can better manage the risks.

Simulating Value at Risk (VaR). Source: https://wikibanks.cz/value-at-risk-var/

Pros

Provides a range of potential outcomes
Provides the likelihood of each potential outcome
Provides what-ifs under different scenarios

Cons

Output is only as good as the input
Probability of extreme events is often underestimated

Processes

Define the input
Choose a probability distribution
Simulate the input by sampling from the probability distribution
Perform a deterministic calculation of the simulated input
Summarize the results

Python implementation

Let’s use the Monte Carlo simulation to calculate pi, denoted as π. We will follow the processes introduced above.

Calculate pi using the Monte Carlo simulation. Source: https://campus.datacamp.com/courses/monte-carlo-simulations-in-python/foundations-for-monte-carlo?ex=3

Define the input

Points (x, y) in a 2-d Cartesian coordinate system

A point (-3, 2) in a Cartesian coordinate system

# import libraries
import numpy as np

# initialize variables
n_simulations = 100000
n_points_circle = 0
n_points_square = 0

2. Choose a probability distribution

X ~ continuous uniform (-1, 1)
Y ~ continuous uniform (-1, 1)

A uniform distribution. Source: https://en.wikipedia.org/wiki/Continuous_uniform_distribution

3. Generate points by sampling from the probability distribution

Generate points (both blue and red) that fall within the square

Both blue and red points are within the square

# create lists to store x and y values
l_xs = []
l_ys = []

# loop n_simulations times
for _ in range(n_simulations):
    
    # x is randomly drawn from a continuous uniform distritbuion
    x = np.random.uniform(-1, 1)
    # store x in the list
    l_xs.append(x)
    
    # y is randomly drawn from a continuous uniform distribution
    y = np.random.uniform(-1, 1)
    # store y in the list
    l_ys.append(y)

4. Perform a deterministic calculation of the simulated input

Check whether each point falls within the circle (red) or not

Red points are within the circle (and the square)

# loop n_simulations times
for i in range(n_simulations):
    
    # calculate the distance between the point and the origin
    dist_from_origin = l_xs[i] ** 2 + l_ys[i] ** 2
    
    # if the distance is smaller than or equal to 1, the point is in the circle
    if dist_from_origin <= 1:
        n_points_circle += 1
    
    # by definition of the uniform distribution, the point is in the square
    n_points_square += 1

5. Summarize the results

Use the pi formula derived above to simulate pi

# estimate the value of pi
pi = 4 * n_points_circle / n_points_square
print(pi)

The simulated pi we got is 3.1434. Accurate to the second decimal place — pretty good!

Putting everything together, this is the complete code to estimate pi using a Monte Carlo simulation:

# import libraries
import numpy as np

# initialize variables
n_simulations = 100000
n_points_circle = 0
n_points_square = 0

# create lists to store x and y values
l_xs = []
l_ys = []

# loop n_simulations times
for _ in range(n_simulations):
    
    # x is randomly drawn from a continuous uniform distritbuion
    x = np.random.uniform(-1, 1)
    # store x in the list
    l_xs.append(x)
    
    # y is randomly drawn from a continuous uniform distribution
    y = np.random.uniform(-1, 1)
    # store y in the list
    l_ys.append(y)
    
# loop n_simulations times
for i in range(n_simulations):
    
    # calculate the distance between the point and the origin
    dist_from_origin = l_xs[i] ** 2 + l_ys[i] ** 2
    
    # if the distance is smaller than or equal to 1, the point is in the circle
    if dist_from_origin <= 1:
        n_points_circle += 1
    
    # by definition of the uniform distribution, the point is in the square
    n_points_square += 1
    
# estimate the value of pi
pi = 4 * n_points_circle / n_points_square
print(pi)

Generate discrete random variables

Discrete uniform distribution

Models a random variable that takes on a given number of values with equal probability.
Example: when a spinner with ten equal sections is spun once, the result follows a uniform discrete distribution.

A spinner. Source: https://brilliant.org/courses/probability_ii/discrete-distributions-2/uniform-discrete-distribution/1/

Bernoulli distribution

Models a random variable with exactly two states: a success and a failure (not literally speaking).
Example #1: when a fair coin (50% heads and 50% tails) is flipped once, the result follows a Bernoulli distribution.
Example #2: when an unfair coin (80% heads and 20% tails) is flipped once, the result follows a Bernoulli distribution.

A coin flip. Source: https://www.cuemath.com/data/tossing-a-coin/

Binomial distribution

Models a sequence of identical and independent Bernoulli trials.
Example: when a fair coin is flipped 100 times, the number of heads we get follows a binomial distribution.

100 coin flips. Source: https://nationaltoday.com/flip-coin-day/

Geometric distribution

Models the number of failures before the first success in a sequence of Bernoulli trials.
Example: the number of times a die needs to be rolled to get the first “6” follows a geometric distribution.

A die roll. Source: https://www.online-stopwatch.com/chance-games/roll-a-dice/

Poisson distribution

Models the number of events occurring over a specific time period. For a random variable to be Poisson distributed, there are conditions to be at least approximately met.
Example: the number of cars traveling on the Golden Gate Bridge in a given hour follows a Poisson distribution.

Cars traveling on the Golden Gate Bridge. Source: https://sf.curbed.com/2018/2/6/16979096/san-francisco-traffic-ranking-cars

Generate continuous random variables

Normal distribution

Models a random variable that is symmetric about the mean and has a bell shape. The data around the mean happens more frequently than the data far away from the mean.
Example: the heights of people in San Francisco follow a normal distribution.

Heights of poeple. Source: https://gigazine.net/gsc_news/en/20160802-comparing-heights/

Exponential distribution

Models the time between events that follow a Poisson distribution
Example: the time between two lightening bolts during a storm follows an exponential distribution.

A lightening bolt. Source: https://mashable.com/article/worlds-longest-lightning-bolt

Gamma distribution

Models the time until an event occurs.
Example: the waiting time until death occurs, aka lifespan, follows a Gamma distribution.

Lifespan. Source: https://guardian.ng/life/life-features/human-lifespan-may-not-have-peaked-after-all-study/

Log-normal distribution

Models a random variable whose logarithm is normally distributed.
Example: the income of a random group of people in the US follows a log-normal distribution.

Income. Source: https://iconscout.com/icon/income-1446505

Conclusion

Picking the appropriate distribution for the use case is important. This process is at the intersection of arts (data intuition) and science (performance measurement).