Introduction to probability

Shivang Ganjoo

Published in

ScienceforData

2 min readFeb 6, 2019

Probability is at the heart of predictive modelling.

Notations

Sample space (Ω) → Contains all possible items of an experiment
F →Set of all events including Φ

Consider a tossing a coin, then Ω = {heads,tails} and F = {heads or tails, heads, tails, Φ}. Probability tells us how likely are we to encounter each of the events in F. Don’t get confused F is the power set of Ω.

Properties

P(S) ≥ 0 for any event S ε F.
P(Ω) = 1
P(∪Si) = Σ P(Si) if Si are disjoint
P(S`) = 1 — P(S)

From Set theory, P(A∪B) = P(A) + P(B) — P(A∩B)

Conditional Probability

This is the probability of an event A given B has already occurred.

P(A|B) = P(A∩B) / P(B). As the sample has been reduced to B, we normalize the probability by dividing with P(B).

Suppose you’ve calculated from an airport data,

P(late∩no rain) = 2/20, P(on time∩no rain) = 14/20, P(late∩rain) = 3/20 P(on time∩rain) = 1/20

You have to find the probability of P(late|rain) i.e. probability that a plane is late given it its raining.

P(late|rain) = P(late∩rain) / P(rain) = (3/20) / (3/20 + 1/20) = 0.75

We can also find P(A∩B) which equals to P(A) * P(B|A), here P(A) is the prior probability of A as it is already known to us and P(B|A) is the posterior probability.

Chain Rule

Allows us to express probability of the intersection of multiple events in terms of conditional probabilities.

P(∩Si) = P(S1)*P(S2|S1)P(S3|S2∩S1)… = Π P(Si|∩Sj) where j =1 to i

A collection of disjoint sets A1, A2, … such that Ω = ∪Ai is called a partition of Ω.

Law of Total Probability

P(S) = Σ P(S∩Ai) = Σ P(Ai)*P(S|Ai) for i =1 to n. This means to find probability of an event say airplane arriving late conditioned on weather conditions, we need to sum all the intersection probabilities of arriving late with the sample space = {rain, no rain}.

P(late) = P(late ∩ rain) + P(late ∩ no rain)

Bayes Rule

P(A|B) ≠ P(B|A) since the prior probabilities are different. But we can invert conditional probabilities using the priors, this is known as the Bayes rule.

P(A|B) = P(B|A)*P(A) / P(B)

Suppose you find out P(rain) from the internet and can calculate P(late) using law of total probability. Then,

P(rain|late) = P(late|rain)*P(rain) / P(late)

This was just an introduction to probability and the next topic covered will be independence.