Monte Carlo simulation in R (Detailed code explanation)— Lottery Winner

Kiel Dang
6 min readDec 1, 2022

One of the most vital tools in the research field is now the Monte Carlo simulation. The simulation method is to create pseudorandom numbers and modify those numbers to replicate samples from different distributions. Then statistics will become handy to solve a particular problem.

In this project, the author will use this approach in R to solve a problem of the lottery.

Photo by Mick Haupt on Unsplash

Problem statement: To win a certain lotto, a person must spell the word big. Sixty percent of the tickets contain the letter b, 30% contain the letter i, and 10% contain the letter g. Find the average number of tickets a person must buy to win the prize. (30 times of experiments).

Solution: In this problem, the author will use the Monte Carlo method of simulation technique using random numbers with 5 steps below.

Step 1 — List all possible outcomes.

There are three possible outcomes for each ticket: b, i, and g.

Step 2 — Determine the probabilities.

Probabilities are given and listed in the following table:

Step 3 — Set up a correspondence between the random numbers and the outcome.

With the given probabilities, the below table matches each outcome with its according to random numbers.

Step 4 — Conduct an experiment.

I create a function that we could repeat the process as many as times we want; below is the code of execution:

Firstly, I create an empty vector to store random numbers.

Secondly, I assign all letters with default FALSE and the original pick is 0.

Thirdly, I start a While (Condition){Command} loop in R. Whenever the condition still holds TRUE, the loop keeps going. Leveraging this condition, the set of !b | !i | !g only turns FALSE and stops when all 3 values (b, i, g) hold TRUE, in which the set is equal to FALSE or FALSE, or FALSE. This is because at least one TRUE in “or combination” makes the set TRUE. Please consult the combination in the console below:

Next, I start the command with the first pick and random number generated from 0 to 9 as well as record those numbers to the empty vector created early in this function.

Lastly, I match the number with each letter using If , Else If, and Else to change status and combine a list of random numbers to a data frame.

# Create a function to get result
Random_Number_List <- c() # Create an empty vector to append the result
Lottery <- function()
{ b <- FALSE # Set default False for all outcome
i <- FALSE # Set default False for all outcome
g <- FALSE # Set default False for all outcome
Total_Pick <- 0 # Number of Pick is 0
# While loop ( Condition is TRUE) {Command}
while(!b | !i | !g) # Only stop when temporary values of b, i, g holing TRUE at the same time
{
Total_Pick <- Total_Pick + 1 # Count the number of Pick
Random_Number <- sample(0:9,1) # Get random numbers from 0 to 9 with 1 unit space
Random_Number_List <- append(Random_Number_List, Random_Number) # Record the random number
Random_Number_List <- paste(Random_Number_List, collapse = ", ") # Convert to character to illustrate the consequence of number
if(Random_Number<=5){b <- TRUE} # Assign value to random number for b
else if(Random_Number<=8){i <- TRUE} # Assign value to random number for i
else {g <- TRUE} # Assign a value to random number for g
Result <- data.frame(Random_Number_List=Random_Number_List,Total_Pick=Total_Pick) # Data frame the result
}
return(Result)
}

Lottery()

Try to run the function several times for testing purposes, the author could assure that the function performs well with correct mapping and accurate counting of picks. As we can see in Figure 1, all results contain 9, denoted for g in the spell.

Figure 1: Function testing

However, as we need to simulate the process several times, another function is created to repeat the process of n times as requested. Below is the logic of coding:

Firstly, I create an empty data frame to record the results of each trial.

Consequently, I start to build a function with a vector trial from 1 to n to guide the computer to iterate until n times. If the condition is satisfied, the Lottery function built above is run.

Finally, I append the result in the initial data frame and create an output list with a sum of total picks as well as the average pick of this experiment.

df <- data.frame(Random_Number_List = character(),
Total_Pick = integer(),
stringsAsFactors = FALSE) # Empty data frame to record result of each trial
Experiment <- function(n) # Function with input n
{trial <- c(1:n) # Vector of trial from 1 to n
for( k in trial)
{if(k<=n) # Repeat calling function n times
{Random_Number_List <- c() # Clear the list from last record
Result <- Lottery() # Run Lottery function
}
df <- rbind(df,Result) # Append result for each trial
Sum_Trial <- sum(df$Total_Pick) # Sum of picks of total trial
Average_Pick <- Sum_Trial/n # Average pick

}
return( list( data = df, # Create a list of return
Sum_Trial = Sum_Trial,
Average_Pick = Average_Pick))

}

Conducting the function with n = 30, we get the result in Figure 2. It is noticeable that the minimum pick is 3, which is the smallest possible time to get enough three letters: b i g. While the 13th trial has the longest pick with 36 random numbers.

Figure 2: Result of experiment for 30 trials

Step 5 — Conclusion.

As our expected pick is 13, the average number of tickets a person must buy to win the prize is 13 times.

Note: The experiment is conducted with 30 trials only. When n increase, the results would provide a better-expected value, which is closer to the theoretical result. Below, the author conducts four more experiments with trials from 1000 to 5000 and assesses the output. With this outcome, we could somewhat predict that the theoretical value is around 11 times.

Figure 3: Different n for the simulations

Note:

In the above-mentioned example, we could leverage it and apply it to real business contexts. Given the same probability, we suppose that there are at least 13 tickets should be sold to win the prize. The business owners should sell more than 13 tickets, let’s assume 20 tickets per session. They also aggregate the cost of operation, the prize cost, as well as the expected profit altogether and then, divide by 20 to get the final price per ticket. Conducting the game session with a large enough number of times, there is a high chance that the business will be profitable.

This is merely one simple instance to illustrate how simulation could work in a real-world setting. And therefore, we could apply to many scenarios in a wide range of fields.

For the code, please visit my github at: https://github.com/kirudang/Monte_Carlo_simulation_R

References:

Leschinski, C. (2019 Jan 31). Vignette: The MonteCarlo Package. https://cran.r-project.org/web/packages/MonteCarlo/vignettes/MonteCarlo-Vignette.html

--

--

No responses yet