Math

All you Need to Know about Probability — Chapter 1.1: Randomness

Christoph Ostertag
Analytics Vidhya
Published in
9 min readMay 9, 2020

--

What outcomes and events are, what independence and conditional probability mean and how we use Bayes Theorem to update our beliefs about the world.

Sheldon uses Bayes Theorem. Source: The Big Bang Theory

“Under Bayes’ theorem, no theory is perfect. Rather, it is a work in progress, always subject to further refinement and testing.” — Nate Silver ( American statistician who analyzes baseball and elections)

Source: https://bit.ly/2yGARWx

This series roughly follows the book “Introduction to Probability” by Mark Daniel Ward and Ellen Gundlach. Though we will also cover more topics.

Thus we will cover the following topics:

  • Chapter I: Randomness
  • Chapter II: Discrete Random Variables
  • Chapter III: Named Random Discrete Variables
  • Chapter IV: Counting
  • Chapter V: Continuous Random Variables
  • Chapter VI: Named Continuous Random Variables
  • Chapter VII: Markov chain Monte Carlo
  • Chapter VIII: Naive Bayes

Chapter I: Randomness

“Under Bayes’ theorem, no theory is perfect. Rather, it is a work in progress, always subject to further refinement and testing.”

How we define Randomness

What is truly random? Is the grade you receive from your professor random or is the family you were born into random. Is the apparently random number a computer generates truly random or just pseudo-random? For our definition we will keep things simple.

Definition 1.1: Randomness is the lack of pattern or predictability in outcomes and events. When something happens at random there are several outcomes that could potentially occur. All outcomes combined form an event.

Outcomes and Events

Source: https://bit.ly/346eSSN

Assume we flip a coin 2 times. Every coin flip has two outcomes, head or tail. Every two coin flips have 4 outcomes, two times head, two times tail, or first head, then tail or vice versa.

Outcomes

We write in set notation {}
The sample space S of possible outcomes: S={{H,H},{T,T},{H,T},{T,H}}
# Every possible outcome

Events

An event is a subset of all possible outcomes, for example, all outcomes that satisfy flipping the same coin twice.

Event that we flip two coins and the land on the same side:
E(Two times the same side) = {{H,H},{T,T}}

Event that we flip two coins and they do not land on the same side:
E(Not two times the same side) = {{H,T},{T,H}}

Event that we flip two coins:
E(We flip two coins) = S = {{H,H},{T,T},{H,T},{T,H}}
# This obviously happens every times, we flipped two coins

Event that we only flip one coin:
E(We flip one coin) = ∅ = {} # This is not possible, we flipped two coins

Thus we can see that an event is a collection of possible outcomes that satisfy some condition. This condition can be possible or not possible given our sample space.

Definition 1.2: The set of all possible outcomes is defined as the sample space S. Every possible event is a subset of the sample space S including the empty set ∅.

Probability

Now how do we define probability? It is the likelihood that some event happens. But the definition is actually not that straightforward, because we do not know if all the events are equally likely. For our purposes we define it as follows:

Definition 1.3: Probability is the likelihood that an event occurs measured by the ratio of favorable outcomes over all outcomes, given that all the outcomes are equally likely.

Rolling a fair 6-sided dice

Assume we roll a dice, we could roll a 1, 2, 3, 4, 5 or 6. This means our sample space contains 6 different outcomes which we denote as |S| = 6. Likewise, our event that we roll an even number contains three possible outcomes 2, 4 and 6 which we denote as |E(even)| = 3. We get the probability of our event occurring by dividing the number of favorable outcomes |E(even)| = 3 over the number of possible outcomes |S| = 6.

S = {{1},{2},{3},{4},{5},{6}}, |S| = 6
E(even) = {{2},{4},{6}}, |E(even)| = 3
P(even) = |E(even)| / |S| = 6 / 3 = 1/2 = 50%

Disjoint and Overlapping Events — Venn Diagrams

Disjoint Events

Let us assume we have two events: Event A and Event B. Event A contains the dice outcomes 1, 2 and 3 while event b contains 5 and 6. 4 is neither in event A, nor in event B. Clearly Event A and B are not overlapping, they are disjoint.

Source: Myself

Now we should already be quite familiar with set notation. 😃

S = {{1},{2},{3},{4},{5},{6}}, |S| = 6

Event A = {{1},{2},{3}}, |Event A| = 3
Event B = {{5},{6}}, |Event B| = 2

Overlapping or Joint Events

Now Event B contains all the even numbers while Event A still contains the numbers 1 to 3. However, now they are overlapping. We call these joint events as they are intersecting on the number 2.

Source: Myself

Common Set Operations

Now that we have a solid understanding of what a set is we can introduce some common operations on sets.

Union

The union A ∪ B denotes all the outcomes that are in A or B or in both.

Example: Let A be the set of all even numbers for rolling a dice and B be the set of all numbers smaller or equal to 3. Their union contains 1,2,3,4 and 6.

A = {{2},{4},{6}}
B = {{1},{2},{3}}
Union: A∪ B = {{1},{2},{3},{4},{6}}

Source: https://bit.ly/2K7sBkU

Intersect

The intersect A ∩ B denotes all the outcomes that are in both A and B. Thus for disjoint events there intersect is always the empty set ∅.

Example: Let A be the set of all even numbers for rolling a dice and B be the set of all numbers smaller or equal to 3. Their intersect contains only 2.

A = {{2},{4},{6}}
B = {{1},{2},{3}}
Intersect: A ∩ B = {{2}}

Source: https://en.wikipedia.org/wiki/Jaccard_index

Complement

The complement of a set A is denoted by A’ and contains all the outcomes in the sample space that are not in set A. So A and A’ are disjoint sets.

Example: The set of all odd numbers A’ when rolling dice is the complement of all the odd numbers A.

S = {{1},{2},{3},{4},{5},{6}}
A = {{2},{4},{6}}
A’ = {{1},{3},{5}}

Source: https://files.askiitians.com/cdn1/images/2017317-165034914-9949-6-complement-of-set.png

Per definition, the union of a set and its complement is the sample space and their intersect is the empty set.

S = {{1},{2},{3},{4},{5},{6}}
A = {{2},{4},{6}}
A’ = {{1},{3},{5}}

Union: A A’ = S
Intersect: A ∩ A’ = ∅ = {}

Setminus

B setminus A, denoted B\A or A-B, contains all the outcomes that are in B, but not in A.

Example: Let A be the set of all even numbers for rolling a dice and B be the set of all numbers smaller or equal to 3. B - A contains all the values in B (1,2,3) minus 2 because 2 is in A. I.e. all odd numbers smaller or equal to 3.

A = {{2},{4},{6}}
B = {{1},{2},{3}}
Intersect: B - A = A\B= {{1},{3}}

Source: https://en.wiktionary.org/wiki/set-theoretic_difference

Conditional Probability

Story-time! Imagine you want to know the probability that your spouse passes away and leaves behind a fortune. You look up how likely it is that your spouse passes away. You figure out in general 2% of husbands pass away each year. You search the internet further and find out that 50% of 65-year-old women are widows and by age 75 this number jumps up to 67%. So somehow you getting rich depends on the age of your husband. But your husband is a rock-star extreme sports enthusiast in his mid-twenties with no known health problems ( — DAMN IT! — ). So surely he is not average, and this 75-year-old Hugh Hefner double that you had to call your husband and who didn’t even leave a single dime ( — DAMN IT AGAIN! — ) was surely not the typical case either. So just pursuing a rich guy ( or girl 😉) does not work optimally. It seems like finally being able to afford all those Gucci belts and Mouwad bags that your husband is just too cheap to buy for you depends on other factors as well. (Remark: This guy really thought a Kate Spade would be a nice present. What a loser.😒) It seems like the probability is somehow conditional. So you go to Medium and read an article about probability by this young and handsome fellow Christoph Ostertag to learn optimal decision-making. Maybe he would be a good guy to settle down with after all.

What are the chances?

Consider our dice example again. We are playing the famous German board game “Mensch ärgere Dich nicht” what literally means “Human, don’t stress out” (quite a name for a game where whole families break up) and need to roll a 6 to finally move in our last piece into the field. We rolled a dice, but our childish friend hides it we have not seen it yet, our friend tells us we rolled an even number. So how likely is it that we rolled a six?

Bayes Theorem

Bayes Theorem tells you how to update your beliefs in the world. You have “probably” heard that before. (No smart joke intended). I like to think about it in a little different way. Assume you want to know if a person at the bar, which you know from friends is interested in you, is male or female. Without any evidence, our basic hypotheses may be that with 50% probability the person is female. Now we get some extra information we call evidence. This person is into you, and you are a guy. Additionally, we know that 90% of females are straight and 10% of men are gay. How likely is the stranger a the mother of your future babies? Or just a female, that is probably easier.
Watch this great video from 3blue1brown and figure it out for yourself and post your solution in the comments then! 😃

In this video, 3blue1brown shows another example of Bayes Theorem initially from the Nobel Prize in Economics winner Daniel Kahneman, the author of the best-selling “Thinking Fast and Slow”, a book about human irrationality and behavioral economics.

Some of my other articles that might interest you

--

--

Christoph Ostertag
Analytics Vidhya

Co-founder of talentbase. We help data science students to land their first job. https://www.talentbase.tech