Basic Probability

Probability is something which we all use in our daily lives knowingly or unknowingly, but what has probability to do with Machine Learning / Deep Learning? Well every time we classify something or try to predict a value using neural nets or simple SVM we actually use the concept of Probability to deduce a function and calculate what would be the predicted value. So with this let’s start with the concepts which are crucially required for ML/DL.

1. Some Basic Notations & their meaning

  • P(A ⋃ B) = Probability of event A or event B or both occurring.
  • P(A ⋂ B) = Probability of event A and B occurring.
  • P(A|B) = Probability of event A occurring given that event B has already occurred.

Now in this tutorial P(A ⋂ B) will be represented as P(AB).

  • P(A-B) = P(A) - P(AB)

2. Some Proofs

Let’s say there are n occurrences of an experiment in which A has occurred n₁ times and B has occurred n₂ times, and A and B together has occurred m times

  • P(A ⋃ B) = P(A) + P(B) - P(AB)

P(A ⋃ B) can be represented as the sum of the probability of the three disjoint events: P(A - B) , P(B - A) and P(AB)

P(A ⋃ B) = P(A — B)+P(B — A)+P(AB)

P(A ⋃ B) = P(A) — P(AB) + P(B) — P(AB) + P(AB)

thus, P(A ⋃ B) = P(A) + P(B) - P(AB)

  • P(AB) = P(A|B).P(B)

P(A|B) is the probability of event A occurring given that event B has already occurred. Given that the event B has already occurred in n₂ ways , the event A is restricted to the event AB that can occur in m ways. So the probability of A given B can be expressed as :

P(A|B) = m / n₂

⇒P(A|B) = m/n ÷n₂/n

⇒ P(A|B) = P(AB)÷P(B)

or, P(AB) = P(A|B).P(B)

Similarly, P(AB) = P(B|A).P(A)

3. Chain Rule of Probability

If A₁, A₂,A₃,…Aₙ is the set of n events then the joint probability of these events can be expressed as :

Joint Probability Formula

4. Mutually Exclusive Events

  • Two events A and B are said to be mutually exclusive if they do not occur at the same time, i.e. simultaneously. In other words if A and B are mutually exclusive then P(AB) = 0.
  • For mutually exclusive events P(A ⋃ B) = P(A) + P(B).
  • In general the probability of the union of n mutually exclusive events can be written as the sum of their probabilities :
Sum of probabilities

5. Independence of Events

  • Two events are said to be independent if the probability of their intersection is equal to the product of their individual probability, i.e. P(AB) = P(A).P(B)
  • This is possible because the conditional probability of A given B is the same as the probability of A , i.e. P(A|B) = P(A) . Similarly, P(B|A) = P(B).

This means that A is likely to happen in the set of all events as it is the domain of B. Similarly B is likely to happen in the set of all events as it is the domain of A.

When two events are independent, neither of the events is influenced by the fact that the other event has happened.

6. Conditional Independence of Events

  • Two events A and B are conditionally independent given a third event C if the probability of co-occurrence of A and B given C can be written as : P(AB|C) = P(A|C).P(B|C).
  • Now by the factorising property , P(AB|C) = P(A|C).P(B|AC).

Thus by combining the equations we can see, P(B|AC) = P(B|C).

Do note that the conditional independence events A and B doesn’t guarantee that A and B are independent too.

7. Bayes Rule

This is the most important underlying concept in Machine Learning / Deep Learning or even in Reinforcement Learning. The whole structure of ML stands on this rule, and in almost every scenario of ML / DL / RL we do apply Bayes Rule to find out solution i.e. the parameter vectors.

So what is this Bayes Rule? To understand this let us take an example.

Let us take two events A and B ,

P(A) →Probability of event A occurring

P(B) →Probability of event B occurring

P(AB) →Probability of event A and B occurring simultaneously

P(A|B) → Probability of event A occurring given that event B has already occurred

P(B|A) → Probability of event B occurring given that event A has already occurred

Now from section 2 we have the following :

P(AB) = P(A).P(B|A) ….(1)

P(AB) = P(B).P(A|B) ….(2)

Combining equations (1) and (2) we get,

P(A).P(B|A) = P(B).P(A|B)

⇒ P(A|B) = (P(A).P(B|A)) ÷P(B)

Thus it describes the probability of an event, based on prior knowledge of conditions that might be related to the event. For example, if cancer is related to age, then, using Bayes’ theorem, a person’s age can be used to more accurately assess the probability that they have cancer, compared to the assessment of the probability of cancer made without knowledge of the person’s age.

Now going through these concepts might be a very tedious job to do as I haven’t shown you all where is it applied in Machine Learning, but hold on, when I will launch my further tutorials on Machine Learning you will see the various application of these small theorems to derive some parameters or features that will be used for optimisation of the algorithm.

So end of today’s lecture. See yaa!!!!.

Thank You all for taking your time and going through this post.

--

--

Adityam Ghosh
Journey to Machine Learning/Deep Learning/Artificial Intelligence

Machine Learning Engineer 🤖 | Kaggle Notebook Expert | Python & Julia Fan 👨‍🎤