Monte Carlo Simulation in Python

Wendy Hu
6 min readApr 5, 2023

--

Pi simulated by Monte Carlo simulation. Source: https://demonstrations.wolfram.com/ApproximatingPiByTheMonteCarloMethod/

Introduction

We talked about resampling methodologies (bootstrap, cross-validation 1/2/3 and permutation test) in previous posts. Since resampling is a special type of Monte Carlo simulation, it’s now time to zoom out from resampling to Monte Carlo simulation to look at the big picture.

Monte Carlo simulation

Monte Carlo simulation samples from either known or assumed probability distributions. The choices of the probability distributions rely on domain knowledge.

Use cases

  • Simulate potential stock prices to help users understand whether their portfolio will be enough to provide the level of living they want after retirement.
Simulating stock prices. Source: https://www.interviewqs.com/blog/intro-monte-carlo
  • Simulate potential financial losses a bank is willing to stomach so that they can better manage the risks.
Simulating Value at Risk (VaR). Source: https://wikibanks.cz/value-at-risk-var/

Pros

  • Provides a range of potential outcomes
  • Provides the likelihood of each potential outcome
  • Provides what-ifs under different scenarios

Cons

  • Output is only as good as the input
  • Probability of extreme events is often underestimated

Processes

  1. Define the input
  2. Choose a probability distribution
  3. Simulate the input by sampling from the probability distribution
  4. Perform a deterministic calculation of the simulated input
  5. Summarize the results

Python implementation

Let’s use the Monte Carlo simulation to calculate pi, denoted as π. We will follow the processes introduced above.

Calculate pi using the Monte Carlo simulation. Source: https://campus.datacamp.com/courses/monte-carlo-simulations-in-python/foundations-for-monte-carlo?ex=3
  1. Define the input
  • Points (x, y) in a 2-d Cartesian coordinate system
A point (-3, 2) in a Cartesian coordinate system
# import libraries
import numpy as np

# initialize variables
n_simulations = 100000
n_points_circle = 0
n_points_square = 0

2. Choose a probability distribution

  • X ~ continuous uniform (-1, 1)
  • Y ~ continuous uniform (-1, 1)
A uniform distribution. Source: https://en.wikipedia.org/wiki/Continuous_uniform_distribution

3. Generate points by sampling from the probability distribution

  • Generate points (both blue and red) that fall within the square
Both blue and red points are within the square
# create lists to store x and y values
l_xs = []
l_ys = []

# loop n_simulations times
for _ in range(n_simulations):

# x is randomly drawn from a continuous uniform distritbuion
x = np.random.uniform(-1, 1)
# store x in the list
l_xs.append(x)

# y is randomly drawn from a continuous uniform distribution
y = np.random.uniform(-1, 1)
# store y in the list
l_ys.append(y)

4. Perform a deterministic calculation of the simulated input

  • Check whether each point falls within the circle (red) or not
Red points are within the circle (and the square)
# loop n_simulations times
for i in range(n_simulations):

# calculate the distance between the point and the origin
dist_from_origin = l_xs[i] ** 2 + l_ys[i] ** 2

# if the distance is smaller than or equal to 1, the point is in the circle
if dist_from_origin <= 1:
n_points_circle += 1

# by definition of the uniform distribution, the point is in the square
n_points_square += 1

5. Summarize the results

  • Use the pi formula derived above to simulate pi
# estimate the value of pi
pi = 4 * n_points_circle / n_points_square
print(pi)

The simulated pi we got is 3.1434. Accurate to the second decimal place — pretty good!

Putting everything together, this is the complete code to estimate pi using a Monte Carlo simulation:

# import libraries
import numpy as np

# initialize variables
n_simulations = 100000
n_points_circle = 0
n_points_square = 0

# create lists to store x and y values
l_xs = []
l_ys = []

# loop n_simulations times
for _ in range(n_simulations):

# x is randomly drawn from a continuous uniform distritbuion
x = np.random.uniform(-1, 1)
# store x in the list
l_xs.append(x)

# y is randomly drawn from a continuous uniform distribution
y = np.random.uniform(-1, 1)
# store y in the list
l_ys.append(y)

# loop n_simulations times
for i in range(n_simulations):

# calculate the distance between the point and the origin
dist_from_origin = l_xs[i] ** 2 + l_ys[i] ** 2

# if the distance is smaller than or equal to 1, the point is in the circle
if dist_from_origin <= 1:
n_points_circle += 1

# by definition of the uniform distribution, the point is in the square
n_points_square += 1

# estimate the value of pi
pi = 4 * n_points_circle / n_points_square
print(pi)

Generate discrete random variables

Discrete uniform distribution

  • Models a random variable that takes on a given number of values with equal probability.
  • Example: when a spinner with ten equal sections is spun once, the result follows a uniform discrete distribution.
A spinner. Source: https://brilliant.org/courses/probability_ii/discrete-distributions-2/uniform-discrete-distribution/1/

Bernoulli distribution

  • Models a random variable with exactly two states: a success and a failure (not literally speaking).
  • Example #1: when a fair coin (50% heads and 50% tails) is flipped once, the result follows a Bernoulli distribution.
  • Example #2: when an unfair coin (80% heads and 20% tails) is flipped once, the result follows a Bernoulli distribution.
A coin flip. Source: https://www.cuemath.com/data/tossing-a-coin/

Binomial distribution

  • Models a sequence of identical and independent Bernoulli trials.
  • Example: when a fair coin is flipped 100 times, the number of heads we get follows a binomial distribution.
100 coin flips. Source: https://nationaltoday.com/flip-coin-day/

Geometric distribution

  • Models the number of failures before the first success in a sequence of Bernoulli trials.
  • Example: the number of times a die needs to be rolled to get the first “6” follows a geometric distribution.
A die roll. Source: https://www.online-stopwatch.com/chance-games/roll-a-dice/

Poisson distribution

  • Models the number of events occurring over a specific time period. For a random variable to be Poisson distributed, there are conditions to be at least approximately met.
  • Example: the number of cars traveling on the Golden Gate Bridge in a given hour follows a Poisson distribution.
Cars traveling on the Golden Gate Bridge. Source: https://sf.curbed.com/2018/2/6/16979096/san-francisco-traffic-ranking-cars

Generate continuous random variables

Normal distribution

Normal distribution. Source: https://brilliant.org/courses/probability_ii/continuous-distributions-2/normal-distribution/1/
  • Models a random variable that is symmetric about the mean and has a bell shape. The data around the mean happens more frequently than the data far away from the mean.
  • Example: the heights of people in San Francisco follow a normal distribution.
Heights of poeple. Source: https://gigazine.net/gsc_news/en/20160802-comparing-heights/

Exponential distribution

  • Models the time between events that follow a Poisson distribution
  • Example: the time between two lightening bolts during a storm follows an exponential distribution.
A lightening bolt. Source: https://mashable.com/article/worlds-longest-lightning-bolt

Gamma distribution

  • Models the time until an event occurs.
  • Example: the waiting time until death occurs, aka lifespan, follows a Gamma distribution.
Lifespan. Source: https://guardian.ng/life/life-features/human-lifespan-may-not-have-peaked-after-all-study/

Log-normal distribution

  • Models a random variable whose logarithm is normally distributed.
  • Example: the income of a random group of people in the US follows a log-normal distribution.
Income. Source: https://iconscout.com/icon/income-1446505

Conclusion

Picking the appropriate distribution for the use case is important. This process is at the intersection of arts (data intuition) and science (performance measurement).

--

--