Probability — The Science of Uncertainty

Eram Khan
Nerd For Tech
Published in
8 min readJul 24, 2021

For the longest time I was not able to make head or tail of probability concepts. Particularly, linking concepts to real world scenarios. My recent coursework on the same subject while pursuing MIT micro masters has helped me a lot to overcome this. I plan to create a series of articles to understand key concepts of probability intuitively and understand the interaction between the field of statistics, probability and data science.

Life is uncertain.

Now, how do we make decisions in this uncertainty?

Most of our decisions are made using previously acquired information, either directly or indirectly. Same logic applies for a machine as well. It needs to have some methodology in place to process the information that is already available, to make predictions and incorporate new information to iteratively improve the accuracy of these predictions.

Inferring data from the real world requires intensive application of Statistics and probability is the foundation and language needed for most statistics. In a way Data Science is essentially statistics on steroids! To really master Machine Learning and influence decision making, it is imperative that one starts with understanding the key concepts of probability.

One makes inference from real world data based on statistics. Models are then built based on these inferences. Now, based on these models, an understanding of the likelihood of all possible events are analysed and predictions are made. Decisions are then made in the real world based on likelihood of various events.

Follow this series to develop an intuitive understanding of probability concepts. By the end of just this article you will be able to model any scenario using a probabilistic model and comment on the likelihood of various outcomes. It will also act as a primer for other articles in this series.

  • Probability models and axioms
  • Conditioning and Independence
  • Concept of Counting
  • Discrete random variables
  • Continuous random variables
  • Further topics on random variables
  • Bayesian inference
  • Limit theorems and classical statistics
  • Bernoulli and Poisson processes
  • Markov chains

In this article I will be covering Probability models and axioms with examples of both continuous and discrete random variables. It is invariably true that certain concepts are much easier to understand with discrete variables. Therefore, we will side by side explore continuous counterparts to actually make sense of the concept in question.

  1. Sample Space
  2. Probability Law — Axioms and Properties
  3. Discrete and Continuous Examples
  4. Countable Additivity
  5. Mathematical Subtleties
  6. Interpretation of Probabilities

SAMPLE SPACE

Sample space is a list (set) of all possible outcomes. This list must be mutually exclusive and collectively exhaustive and at the right granularity.

There are two steps to conclusively define a sample space

  1. Describe possible outcomes
  2. Describe beliefs about likelihood of outcomes

So, possible sample space Ω for a coin toss can be {Heads,Tails}.

But depending upon other factors that are to be included in the sample space, additional events might be added and the sample space can be changed to below.

{Heads and it rains, Heads and no rain, Tails and it rains, Tails and no rain}

Even if rain does not have a causal relationship with the coin toss. It is a perfectly acceptable sample space because it contains all possible scenarios and all events are mutually exclusive. Although, one can experiment with granularity.

{Heads and it rains, Heads no rain, Tails} is also a perfectly acceptable sample space.

{Heads and it rains, Tails and no rain, Tails}

This however is not a valid sample space because it excludes the “Heads and no rains” event and also the events “Tails and no rain” and “Tails” are not mutually exclusive.

Now, let us talk about a continuous sample space.

If one throws a dart inside a unit square bounded by the x-y axis. And x and y are recorded with infinite precision. The sample space is just the set of x-y pairs that lie between 0 and 1.

Ω = (x,y) such that 0<=x , y<=0

An event therefore is generally associated with a subset of sample space because probability of a single point is 0.

PROBABILITY AXIOMS

Interestingly, there are only three axioms of probability using which all other properties of probability can be derived.

  1. Non-negativity P(A)>=0: Probability of any event in the sample space is greater than or equal to 0.
  2. Normalisation P(Ω)=1: Summation of probabilities of all events of a sample space add up to 1.
  3. Finite additivity: If event A and B are mutually exclusive, the probability of their union is equal to the summation of individual probabilities of A and B.

That is, if A ∩ B = ɸ then P(A ∪ B) = P(A) + P(B)

PROPERTIES OF PROBABILITY

  • P(A)>=0 implies P(A) <=1

Since 1=P(Ω) = P(A ∪ Ac) Therefore, P(A) = 1 — P(Ac) <=1

  • P(Ω)=1 implies P(ɸ)=0 Since the sample space contains all possible events and ɸ contains no events. The probability of ɸ is 0.
  • For disjoint events P(A ∪ B) = P(A) + P(B). This property can be extended for k disjoint events as follows:

P(A ∪ B ∪ C) = P(A) + P(B) + P(C)

Since P(A ∪ B ∪ C) = P((A ∪ B) ∪ C) = P(A ∪ B) + P(C) = P(C)

Similarly, for k disjoint events

P({s1,s2,..sk}) = P(s1) + P(s2)… + P(sk)

MORE PROPERTIES OF PROBABILITY

Now let us move on to some of the probability properties you would have memorized as a kid. Yes, all of these can be derived from the axioms we discussed.

  1. If A is a subset of B, A ⊆ B , then P(A)<=P(B)

Since, B can occur with both A and Ac, B= A ∪ (B ⋂ Ac)

P(B) = P(A) + P(B ⋂ Ac) >= P(A)

2. Arguably, one of the most used property:

P(A ∪ B) = P(A) + P(B) — P(A ⋂ B)

Let a = P(A ⋂ Bc) b=P(A ⋂ B) c=P(Ac ⋂ B)

P(A ∪ B) = a+b+c

P(A) + P(B) — P(A ⋂ B) = (a+b) + (b+c) — b = a+b+c

3. P(A ∪ B) <= P(A) + P(B) (The union bound)

The summation of probabilities of region A, B and C, not necessarily disjoint can be expressed as the summation of blue, red and green areas, that is,

P (A ∪ B ∪ C) = P(A) + P(Ac ⋂ B) + P(Ac ⋂ Bc ⋂ C)

DISCRETE UNIFORM LAW

Let Ω be a set of n equally likely events, each having a probability of 1/n. Let event A consist of k equally likely events. In that case by the law of additivity

P(A) = 1/n + 1/n… k times, implies

P(A) = k*1/n

This law is termed as the discrete probability law.

UNIFORM PROBABILITY LAW

In case of continuous sample space it is not possible to count the number of favorable outcomes. That is, k is not defined. But luckily there exists an alternative. Probability for continuous sample space is defined as the area covered under the curve of favorable outcomes. If uniform probability law is applied, Probability = Area

Let us consider the same sample space used for the continuous example earlier.

Let us calculate the probability that the sum of the two numbers that we get in our experiment is less than or equal to ½.

P({(x,y) | x+y <= ½}) = ½*½*½ (½ * base * height) = ⅛

When working with a continuous sample space, visualizing the area of interest graphically becomes the primary task for calculating probability.

It is very important to understand that probability in this case can only be defined in terms of area. The probability of any single point will always be 0. For example, P({(0.5,0.3)} = 0

PROBABILITY CALCULATION STEPS

Now, straight to the point, how do you calculate the probability of an event? Just keep in mind the example given above and follow the below steps:

  1. Define sample space: Try to actually visualize the sample space on a graph.
  2. Specify probability law: Identify which probability law relates to the scenario of your interest the most. In the previous example we followed the uniform probability law.
  3. Identify events of interest: Try to visually identify the subset of sample space you are interested in.
  4. Now it is just basic math. Calculate.

COUNTABLE ADDITIVITY

When we talk about the property of additivity it is very important to understand that it only works when the count of events (k) in the sample space is defined.

Imagine a discrete but infinite sample space. Suppose, we keep on tossing a coin and the outcome is the number of tosses until we observe heads for the first time.

Now suppose probability of a heads in a given toss is P(n) = 1/(2^n) n= 1,2

In order for this to be a legitimate probability law, the sum of probability of all possible values of n should be 1. On doing basic arithmetic it is observed that it is indeed true.

Sum of this infinite series is 1/(1-(1/2)) = 1. Ergo yes, it is a possible scenario.

Now, think of a general scenario, say the probability that the outcome is even.

P(even outcome) = P({2,4,6,8….})

= P(2) + P(4)+ P(6).. [1] (Since the sets are disjoint, we are using the additivity property)

= 1/(2²) + 1/(2⁴) + 1/(2⁶).. = ¼ * (1/(1–¼)) [2] (using the property of sum of infinite series)

= ⅓

But can [1] be used for an infinite number of events? The answer is yes! Thanks to the countable additivity axiom,

If A1,A2,A3.. Is an infinite sequence of disjoint events then

P(A1 ∪ A2 ∪ A3 ..) = P(A1) + P(A2) + P(A3) + …

Additivity property holds true for an infinite set of discrete events so long as they are a countable sequence of events. That is events that can be arranged in an order.

One can argue that even a continuous sample set is essentially a collection of infinite discrete events. Should the additivity property be valid here as well?

The answer is No. Because this scenario is missing the keyword “sequence”. One cannot arrange the infinite discrete events inside a sample space in any order. Therefore, the countable additivity axiom is not valid in this case.

INTERPRETATIONS

What does “probability” really mean?

  • If we go by a very narrow interpretation it is a branch of mathematics. Frequency of event A is P(A) in an infinite number of repetitions of the experiment.
  • But are probabilities frequencies?

There is also situations that do not directly correspond with this view, like P(head in a coin toss) = 1/2

P(president getting re-elected in upcoming election) = ¾

  • In a contextual manner, probabilities are often interpreted as description of a belief or say your betting preference

I hope now, by the end of this mammoth article, you have developed your own contextual understanding of Probability. I strongly believe, however much driven by theory and proof, Probability is the science of uncertainty.

--

--