Distributions

Wendy Hu
18 min readApr 22, 2023

--

Normal vs. paranormal distributions. Source: https://www.pinterest.com/pin/776941373204583883/

Discrete distributions

Spinner. Source: https://brilliant.org/courses/probability_ii/discrete-distributions-2/uniform-discrete-distribution/1/

Uniform distribution describes a random variable that takes on a given number of values with equal probability.

Models:

  • Roll a fair 6-sided die
  • Spin a spinner with 4 equal sections

Probability mass function:

A spinner with 10 sections. Source: https://brilliant.org/courses/probability_ii/discrete-distributions-2/uniform-discrete-distribution/1/

Example #1. What is the mean of a random variable described by a standard uniform distribution with 10 possible values?

The standard uniform distribution with 10 values takes on values 1, 2, …, 10. Since each value occurs with equal probability, the mean of the random variable is 1/10 * (1 + 2 + 3 + 4 + 5 + 6 + 7 + 8 + 9 + 10) = 5.5.

A die. Source: https://www.dreamstime.com/photos-images/die.html

Example #2. What is the variance of the value resulting from rolling a fair 6-sided die?

  • E[X] = 1/6 * (1 + 2 + 3 + 4 + 5 + 6) = 3.5
  • E[X²] = 1/6 * (1² + 2² + 3² + 4² + 5² + 6²) = 91/6
  • Var[X] = E[X²] — E[X]² = 91/6 — 3.5² = 35/12
A coin. Source: https://en.wikipedia.org/wiki/Lincoln_cent

Bernoulli distribution describes a random variable that has exactly two outputs: “success” and “failure”.

Models:

  • Flip a coin
  • Guess a true/false question

Probability mass function:

A die. Source: https://www.dreamstime.com/photos-images/die.html

Example #3. Consider an unfair 6-sided die, with the following probability distribution for X, the number rolled. Let Y=1 be success, the event that the die lands on an odd number. Let Y=0 be failure, the event that the die lands on an even number. What is the probability of success?

The probability of success is the sum of the probability of getting 1, the probability of getting 3 and the probability of getting 5.

  • P(Y=1) = P(X=1) + P(X=3) + P(X=5) = 1/8 + 3/16 + 5/32 = 15/32
A coin. Source: https://en.wikipedia.org/wiki/Lincoln_cent

Example #4. What is the mean of a Bernoulli distribution with probability p=0.3 of success?

The mean of the Bernoulli distribution is the expected value.

  • E[X] = p*1+(1-p)*0 = p = 0.3
A coin. Source: https://en.wikipedia.org/wiki/Lincoln_cent

Example #5. What is the variance of a Bernoulli distribution with probability p=0.3 of success?

  • E[X] = p * 1 + (1 — p) * 0 = p = 0.3
  • E[X²] = p * 1² + (1 — p) * 0² = p = 0.3
  • E[X²] — E[X]² = 0.3 — 0.3² = 0.21
3 coin flips. Source: https://brilliant.org/courses/probability_ii/expected-value-4/expected-value-linearity-of-expectation/1/

Binomial distribution describes a sequence of identical, independent Bernoulli trials.

Models:

  • Flip a coin 10 times
  • Guess a true/false question 100 times

Probability mass function:

Example #6. A fair coin is flipped 10 times. What is the probability that an even number of heads appear?

The probability of an even number of heads occurring is the sum over the probability of getting 0 heads, the probability of getting 2 heads etc all the way to getting 10 heads.

Widget — defective vs. non-defective. Source: https://brilliant.org/courses/probability_ii/discrete-distributions-2/binomial-distribution/1/

Example #7. A manufacturer knows that 20% of the widgets he produces are defective. He needs to sell at least 10 (non-defective) widgets to meet his quota. How many times does he need to produce a (possibly defective) widget so that he has at least a 90% chance of meeting his quota?

The probability of meeting the quota is the sum over the probability of getting 10 non-defective widgets out of n widgets, the probability of getting 11 non-defective widgets out of n widgets etc. Set this probability to larger than or equal to 0.9.

Try different values of n (10, 11, 12 etc) to get the cumulative probabilities.

Calculator output. Source: https://www.wolframalpha.com/input?i=summation+from+i+%3D+10+to+n+of+%28%28n+choose+i%29+%280.8%29%5Ei+%280.2%29%5E%28n-i%29%29

Since n = 15 is the first value that achieves a probability larger than or equal to 0.9, the manufacturer needs to produce at least 15 widgets to meet the quota with a 90% probability.

A coin. Source: https://en.wikipedia.org/wiki/Lincoln_cent

Example #8. A weighted coin has a probability p=0.10 of showing heads. What is the expected number of heads resulting from flipping the coin 100 times?

Let X be the number of heads in the flip.

A coin. Source: https://en.wikipedia.org/wiki/Lincoln_cent

Example #9. A weighted coin has probability p=0.10 of showing heads. What is the variance in the number of heads resulting from flipping the coin 100 times?

Let X be the number of heads in the flip.

Binomial test is a hypothesis test to check whether the data are likely from a binomial distribution with a specific value. The p-value is calculated from the binomial distribution. A decision is made based on the p-value in relation to the null hypothesis and the alternative hypothesis.

A coin. Source: https://en.wikipedia.org/wiki/Lincoln_cent

Geometric distribution describes the number of failures before the first success in a sequence of Bernoulli trials. It is memoryless meaning that the probability of a success doesn’t depend on previous failures.

Models:

  • Flip a coin until the first head comes up
  • Roll a die until the first 3 comes up

Probability mass function:

Monopoly. Source: https://www.walmart.com/ip/Monopoly-Game-Classic-Family-Board-Game-for-2-to-6-Players-for-Kids-Ages-8-and-Up/55332132

Example #10. In the game of Monopoly, a player must stay in jail until doubles are rolled. What is the probability that the player gets out of jail after exactly 3 rolls (of two dice)?

To get out of jail in 3 rolls, this is what needs to happen:

  • 1st roll: no double → probability = 1 (any number on the first die) * 5/6 (any non-matching number on the second die) = 5/6
  • 2nd roll: no double → probability = 1 (any number on the first die) * 5/6 (any non-matching number on the second die) = 5/6
  • 3rd roll: a double → probability = 1 (any number on the first die) * 1/6 (a matching number on the second die) = 1/6

p(exactly 3 rolls) = 5/6 * 5/6 * 1/6 = 25/216

Monopoly. Source: https://www.walmart.com/ip/Monopoly-Game-Classic-Family-Board-Game-for-2-to-6-Players-for-Kids-Ages-8-and-Up/55332132

Example #11. In the game of Monopoly, a player must stay in jail until doubles are rolled. What is the expected number of turns for a player to escape from jail?

Flowchart of escaping from jail

Let X = number of more turns in jail. We will use recursion to solve the problem. At the current turn, there is a 1/6 chance to roll a double to get out of jail with no more turns in jail. There is a 5/6 chance to roll a non-double to stay in jail which would result in (1 + E[X]) more turns in jail. Solving E[X] = 1/6 * 0 + 5/6 * (1 + E[X]) gives E[X] = 5. The expected number of turns to break out of jail is 5 more turns in jail + 1 turn to break out of jail = 6 turns in total.

Monopoly. Source: https://www.walmart.com/ip/Monopoly-Game-Classic-Family-Board-Game-for-2-to-6-Players-for-Kids-Ages-8-and-Up/55332132

Example #12. In the game of Monopoly, a player must stay in jail until doubles are rolled. If the player has already been in jail for 4 turns, what is the expected number of additional turns needed before the player escapes from jail?

Since the geometric distribution is memoryless, being in jail for 4 turns is essentially the same as trying to break out of jail in the first turn. With the memoryless property, the past is irrelevant to the future. The expected number of additional turns to break out of jail is still 6 turns.

A coin. Source: https://en.wikipedia.org/wiki/Lincoln_cent

Example #13. Alice and Bob take turns flipping a fair coin, and the winner is the first player to flip a head. If Alice flips the coin first, what is the probability that Alice wins?

Alice wins when any of the situation happens:

  • The 1st flip is a head → probability = 1/2
  • The first 2 flips are tails and the 3rd flip is a head → probability = 1/2³
  • The first 4 flips are tails and the 5th flip is a head → probability = 1/2⁵

Probability that Alice wins = 1/2 + 1/2³ + 1/2⁵ + … = 1/2 * (1 + 1/2² + 1/2⁴ + …) = 1/2 * 1 / (1–1/4) = 2/3.

Carnival. Source: https://www.visitrichmondbc.com/wp-content/uploads/2021/10/carnival.jpg

Example #14. At a carnival in St. Petersburg, a game is played in which the pot starts at $2, and a fair coin is flipped repeatedly. If the coin flips heads, the player takes the pot and the game ends. Otherwise, the pot is doubled. What is the probability that the player wins exactly $32?

To win exactly $32, this is the sequence of flips: T ($2) → T ($4) → T ($8) → T $16) → H ($32). The probability of getting this sequence is (1/2)⁴ * (1/2) = 1/2⁵ = 1/32.

A line. Source: https://www.cnn.com/style/article/design-of-waiting-lines/index.html

Poisson distribution describes the number of events happening in a time period.

Models:

  • The number of cars traveling on the Golden Gate Bridge between 7am-9am.
  • The number of customers at the checkout counter at Costco between 3pm — 4pm.

Probability mass function:

Where k = the average number of events in a given time period and also k = variance of the number of events in a given time period.

Mailbox. Source: https://t4.ftcdn.net/jpg/02/85/92/97/360_F_285929722_GIeG98umjtDxjbDd82xyNHLtgy8PtSqz.jpg

Example #15. A person knows that on average, they receive 9 letters in the mail per week. What is the probability (rounded to the nearest hundredth) that, in a given week, the person receives exactly 9 letters?

Let X be the number of letters in a week. Plug in X = 9 and λ = 9 to the probability mass function to get: p(X = 9) = 9⁹e^(-9)/9! = 0.1318.

Mailbox. Source: https://t4.ftcdn.net/jpg/02/85/92/97/360_F_285929722_GIeG98umjtDxjbDd82xyNHLtgy8PtSqz.jpg

Example #16. A person knows that they average 9 letters in the mail per week. What is the standard deviation in the number of letters the person receives in a given week?

The variance of the number of letters in a week is 9. The standard deviation is square root of the variance sqrt(9) = 3.

Cookie. Source: https://t3.ftcdn.net/jpg/03/77/38/84/360_F_377388434_byWJFE7lFYOKYGVJKHr0H8ftKxFcv1nM.jpg

Example #17. A batch of cookie dough makes 100 cookies. What is the smallest number of chocolate chips that should be added to the dough so that, when thoroughly mixed and sliced into 100 cookies, the probability that a cookie contains zero chocolate chips is less than 1%?

Let x = # chocolate chips = 0, λ = average number of chocolate chips per cookie = n_chocolate_chips / n_cookies.

The probability that the cookie contains 0 chocolate chips = p(x = 0) = λ⁰ e^(-λ) / 0! = e^(-λ) ≤ 0.01. Take the natural log on both sides of the equation to get: ln(e^(-λ)) ≤ ln(0.01) → -λ ≤ ln(0.01) → λ ≥ -ln(0.01) → λ ≥ 4.6052.

Since λ = n_chocolate_chips / n_cookies = n_chocolate_chips / 100 > 4.6052. So the minimum number of chocolate chips required = 100 * 4.6052 = 460.52.

Hospital. Source: https://static3.depositphotos.com/1006233/210/i/450/depositphotos_2107725-stock-photo-switzerland-button-with-flag.jpg

Example #18. An emergency room averages 3 patients per hour. Suppose each patient takes the full hour to treat. What is the fewest number of doctors that should be on call so there is at least a 90% chance of not having more patients than doctors during a given hour?

Since each patient takes a full hour to treat by a doctor, in a given hour, each patient needs a doctor. Let X be the number of patients in an hour, k be the number of doctors in an hour, λ = average patients in an hour = 3. We want the probability of having at least as many doctors as patients to be at least 90%: p(X ≤ k) ≥ 0.9.

  • P(X = 0) = 3⁰ * e^(-3) / 0! = 0.0498; P(X ≤ 0) = 0.0498
  • P(X = 1) = 3¹ * e^(-3) / 1! = 0.1494; P(X ≤ 1) = 0.1992
  • P(X = 2) = 3² * e^(-3) / 2! = 0.2240; P(X ≤ 2) = 0.4232
  • P(X = 3) = 3³ * e^(-3) / 3! = 0.2240; P(X ≤ 3) = 0.6472
  • P(X = 4) = 3⁴ * e^(-3) / 4! = 0.1680; P(X ≤ 4) = 0.8152
  • P(X = 5) = 3⁵ * e^(-3) / 5! = 0.1008; P(X ≤ 5) = 0.9160

P(X ≤ 5) = 0.9160 > 0.9. If the emergency has at least 5 doctors on call, there is at least a 90% chance of having at least as many doctors as patients.

Continuous distributions

Normal distribution. Source: https://en.wikipedia.org/wiki/File:Normal_Distribution_PDF.svg

Normal distribution is symmetric about the mean and is bell-shaped. It is described by two parameters: mean and standard deviation. It is a popular distribution partly because of the central limit theorem: as the sample size increases, the sampling distribution of the sample means is approximately normally distributed.

Models:

  • Height of people
  • Blood pressure of people

Probability density function:

Age. Source: https://img.money.com/2017/06/170613-how-old-is-old.jpg?quality=60&w=800

Example #19. The ages of the members of an organization containing 10,000 people are normally distributed with mean 27 and standard deviation 7. Approximately how many members of the organization are teenagers (people older than 13 and not yet 20 years old)?

The age distribution

13 is 2 standard deviation below the mean and 20 is 1 standard deviation below the mean. Since approximately 68% of the data lies within 1 standard deviation of the mean, the probability of data falls between 20 and 27 (half of 1 standard deviation) is 68%/2 = 34%. Since approximately 95% of the data lies within 2 standard deviation of the mean, the probability of data falls between 13 and 27 (half of 2 standard deviation) is 95%/2 = 47.5%. The probability for the data to fall between 13 and 20 is 47.5% — 34% = 13.5%. So we have 13.5% * 10,000 = 1,350 people age between 13 and 20.

Stock. Source: https://en.wikipedia.org/wiki/Stock_market

Example #20. A stock portfolio averages a 15% return with a 30% standard deviation. What is the approximate probability that the portfolio loses money in a given year, assuming the returns are normally distributed?

Losing money means a return of 0 or less. Z score of 0 = (0–15)/3 = -0.5. The problem becomes: calculating the probability of a Z score smaller than or equal to -0.5. Use an online calculator with the Z score will return p(Z score ≤ -0.5) = 0.308.

Tail risk. Source: https://research.macrosynergy.com/wp-content/uploads/2018/07/FatTails_03.png

Example #21. A tail risk is defined as an investment that moves more than three standard deviations from the mean of a normal distribution of investment returns. What is the probability a tail risk occurs?

Normal distribution. Source: https://upload.wikimedia.org/wikipedia/commons/thumb/8/8c/Standard_deviation_diagram.svg/1200px-Standard_deviation_diagram.svg.png

More than three standard deviations away from the mean can be

  • 3 standard deviations below the mean
  • 3 standard deviations above the mean.

We can sum up the probability of getting a return 3 standard deviations below the mean and the probability of getting a return 3 standard deviations above the mean. The probability of the return being 3 standard deviations away from the mean is p(Z ≤ -3σ) + p(Z ≥ 3σ) = 0.1% + 0.1% = 0.2%.

Example #22. The SAT is designed to have approximately normally distributed scores, with a mean of 500 and a standard deviation of 100. What (approximate) score is necessary to be at the 70th percentile of scorers?

70th percentile means the score is greater than 70% of the other scores. This turns the problem into: getting the Z score from this p(x≤Z) = 0.7. By using an online calculator, we get Z = 0.524. Solve for the equation: (score — 500)/100 = 0.524 gives us a raw score of 552.

Stock. Source: https://en.wikipedia.org/wiki/Stock_market

Example #23. A portfolio consists of 9 independent stocks, each of which is normally distributed with an average return of $0.15 and a standard deviation of $0.40. What is the average return and standard deviation of the entire portfolio?

Since the stocks are independent, the mean and the variance of the individual stocks are additive. The mean of the portfolio = 0.15 * 9 = 1.35. The variance of the portfolio = 0.4² * 9 = 1.44. The standard deviation of the portfolio = sqrt(1.44) = 1.2.

Exponential distribution. Source: https://en.wikipedia.org/wiki/Exponential_distribution#/media/File:Exponential_distribution_pdf_-_public_domain.svg

Exponential distribution describes the amount of time between events that follow a Poisson distribution.

Models:

  • The time between the two buses at a bus stop
  • The time between two lightening bolts in a thunderstorm

Probability density function:

Parts. Source: https://brilliant.org/courses/probability_ii/discrete-distributions-2/binomial-distribution/1/

Example #24. The lifetime of a part follows an exponential distribution, and each part lasts an average of 1 year. A manufacturer offers to replace any part that breaks within 3 months. What is the (approximate) probability the manufacturer will have to replace a given part?

Let X be the time it takes for a part to break down. The probability of replacing a given part = the probability of a given part breaking down between 0 month and 3 months (1/4 a year) = p(0≤ X ≤ 1/4). Plug in λ = 1 to the probability density function and integrate the probability density function over time 0 to time 1/4:

Call center. Source: https://logowik.com/content/uploads/images/call-center5116.jpg

Example #25. At a call center, calls come in every 20 minutes on average. What is the (approximate) probability that no calls will come in for a 30 minute period?

Let X be the time it takes for a call to come in. The probability of no calls in a 30-minute period = the probability of having calls after 30 minutes (1/2 hour) = p(1/2 ≤ X ≤ ∞). Plug in λ = 3 (20 minutes a call on average; 3 calls in an hour on average) to the probability density function and integrate the probability density function over the time interval [1/2, ∞):

Parts. Source: https://brilliant.org/courses/probability_ii/discrete-distributions-2/binomial-distribution/1/

Example #26. A machine takes an average of 10 minutes to produce a part. How long (approximately, in minutes) should the operator wait to be at least 95% sure that the machine has produced the part?

Let X be the time it takes to produce a part. The probability of producing a part in n minutes = p(0 ≤ X ≤ n). Plug in λ = 6 (10 minutes a part on average; 6 parts in an hour on average) to the probability density function and integrate the probability density function over the time interval [0, n]:

Lottery. Source: https://g9g6f4y6.stackpathcdn.com/wp-content/uploads/2023/03/jackpotfamily-20k.jpg?x10889

Example #27. A lottery is hit every 4 months on average. Given that 3 months have already passed since the last jackpot was awarded, what is the (approximate) probability that the jackpot will be awarded within the next 3 months?

Since the exponential distribution is memoryless, the fact that 3 months have passed is irrelevant to the future. Let X be the time it takes to hit a jackpot. The probability of hitting a jackpot within the next 3 months (1/4 year) = p(0 ≤ X ≤ 1/4). Plug in λ = 3 (4 months on average; 3 jackpots in a year on average) to the probability density function and integrate the probability density function over the time interval [0, 1/4]:

Log-normal distribution. Source: https://en.wikipedia.org/wiki/Log-normal_distribution#/media/File:Log-normal-pdfs.png

Log-normal distribution describes a random variable whose logarithm is normally distributed.

Models:

  • The income of the US population
  • The changes in the Nasdaq index

Probability density function:

Relationship between a log-normal random variable and a normal random variable:

Expected value and variance:

Example #28. If X is a variable such that lnX is normally distributed with mean 1 and standard deviation 2, what is the mean of X?

If the natural log of X is normally distributed, X is log-normally distributed. Plug in μ = 1 and σ = 2 to the expected value formula:

Example #29. If X is a variable such that lnX is normally distributed with mean 1 and standard deviation 2, what is the variance of X?

If the natural log of X is normally distributed, X is log-normally distributed. Plug in μ = 1 and σ = 2 to the variance formula:

Stock. Source: https://en.wikipedia.org/wiki/Stock_market

Example #30. A stock price is currently $50, and the factor by which the price is multiplied after a year follows a log-normal distribution with μ=0.1,σ=0.3. What is the (approximate) probability that the stock price will be below $50 a year later?

Stock price after a year = $50 * factor. To make it smaller than $50, factor needs to be smaller than 1. Plug in μ=0.1,σ=0.3 in the log normal random variable formula and get factor = e^(0.1+0.3Z).

Stock. Source: https://en.wikipedia.org/wiki/Stock_market

Example #31. A stock price is currently $50, and the factor by which the price is multiplied after a year follows a log-normal distribution with μ=0.1,σ=0.3. What is the (approximate) expected value of the stock after one year?

The expected value of the stock in a year is equal to $50 * factor.

Stock. Source: https://en.wikipedia.org/wiki/Stock_market

Example #32. A stock price is currently $50, and the factor by which the price is multiplied after a year follows a log-normal distribution with μ=0.1,σ=0.3. What is the (approximate) expected value of the stock after 25 years?

The expected value of the stock in 25 years is equal to $50 * factor²⁵.

Example #33. A stock price is currently $50, and the factor by which the price is multiplied after a year follows a log-normal distribution with μ=0.1,σ=0.3. After 25 years, what is the (approximate) probability that the stock price is at least $610?

The stock price in 25 years is equal to $50 * factor²⁵ = 50 * (e^(0.1+0.3Z))²⁵. Get the probability of the stock price being at least 610.

Gamma distribution describe the amount of time before an event happens.

Models:

  • Lifespan of a person, aka amount of time before death happens
  • The waiting time at a checkout counter at Costco

Probability density function:

Where

Expected value:

Variance:

Example #34. Determine the value of Γ(8/3)/Γ(2/3).

Example #35. Suppose X follows a gamma distribution:

If θ=k=1, what is P(X>0.5)?

--

--