Using Python to visualize probability questions

Stephen Godfrey
Stephen Godfrey’s blog
3 min readMar 11, 2019

Using Python and some of its libraries can be a great way to visualize, compare and understand probability distributions. Combining statistics calculation functionality with plotting packages allows us to easily see the shape of probability density functions and to answer thought probability experiments. These tools can be helpful in understanding fundamental distributions and in addressing complex problems. This blog posts looks at some of the common ways this is done.

Let’s start by considering the following game offered at your local casino. The rules are

  • A player rolls two dice summing their total
  • The player then selects a number of cards from a standard deck equal to the total
  • The player is paid $25 if she draws an ace, $2 for a king and nothing for any other card
  • Playing cards are not replaced after they are drawn

The question is how much should she pay to play? To answer this question, let’s use Python to simulate the game and find its expected value. The expected value or the average value over our simulation set will be the fair price of playing.

One roll of the dice

To start we will look at rolling a single dice. We can simulate rolling it by using the Numpy random integer generator, randint. In order to be able to replicate the results in the future, we’ll pick a seed for the random number generator.

If we run this for 10 samples, we get the following output: array([4, 2, 6, 4, 2, 6, 1, 5, 1, 6]) where these numbers represent the results of a six-sided dice. Now we can run it again with a sample size of 10,000 and plot the results in a histogram using Seaborn distplot function. Running the code below will generate this graph.

A simulated roll of one dice

Game simulation

Now we’ll use the same function to simulate the game. In this case, we write code that

  • rolls the two dice
  • sums their values
  • picks that many cards from a deck without replacement
  • calculates the payout

We’ll store this payout in a list called game_record and use it to (a) determine the expected value of the game and (b) plot a histogram of simulated payouts using Seaborn’s distplot() function.

Simulations of the game

The average payout over these 10,000 simulations is $14.47 giving us a robust estimate of the fair value of the game. Looking below we can see a histogram of the payout distribution. Payout values are concentrated around 0 for the case when no aces or kings are drawn, $2 for the case when 1 king is drawn, $25 for the case one ace and $50 for the rare case of two aces.

Once we have the code completed, we can often easily modify it to address other thought experiments. For example, in this example we can ask the question of what happens if we allow replacement when cards are drawn. That can be answered by changing the replace parameter to True in the np.random.choice function. If we run it again, we find an average payout of $14.65. This is somewhat surprising since the resulting value is close to the result without replacement. Since the expected value of the sum of two dice is seven, these simulation results are indicating that drawing seven cards with replacement and seven without replacement under this payout scheme is similar. Such findings underscore the usefulness of simulations in addressing probability problems.

Simulations with replacement of the cards

Conclusion

In this blog post, we have seen an example of using Python and the Numpy and Seaborn libraries to analyze a probability problem. Simulating probability events to build a distribution can be a useful way to gain insight and well worth the programming effort. For more information, check out the following links.

References

--

--

Stephen Godfrey
Stephen Godfrey’s blog

Stephen Godfrey is an experienced technical product manager with deep expertise in quantitative analysis and strategic planning.