Probability Theory and its Huge Importance in Machine Learning

Published in

Machine Learning Mindset

6 min readJan 30, 2020

Probability theory is the branch of mathematics involved with probability. The notion of probability is used to measure the level of uncertainty. Probability theory aims to represent uncertain phenomena in terms of a set of axioms. Long story short, when we cannot be exact about the possible outcomes of a system, we try to represent the situation using the likelihood of different outcomes and scenarios.

The actual science of logic is conversant at present only with things either certain, impossible, or entirely doubtful, none of which (fortunately) we have to reason on. Therefore the true logic for this world is the calculus of Probabilities, which takes account of the magnitude of the probability which is, or ought to be, in a reasonable man’s mind.
James Clerk Maxwell

In this post, you will learn:

What is the probability theory?
Why it is important in Artificial Intelligence and Machine Learning?
The fundamental definitions in probability theory
Some mathematical background

Probability theory in Machine Learning

The probability theory is of great importance in many different branches of science. Let’s focus on Artificial Intelligence empowered by Machine Learning . The question is, “how knowing probability is going to help us in Artificial Intelligence?” In AI applications, we aim to design an intelligent machine to do the task. , the model should get a sense of the environment via modeling.

As there is ambiguity regarding the possible outcomes, the model works based on estimation and approximation, which are done via probability. , as the machine tries to learn from the data (environment), it must reason about the process of learning and decision making. Such reasoning is not possible without considering all possible states, scenarios, and their likelihood. , to measure and assess the machine capabilities, we must utilize probability theory as well.

Probability Axioms

Let’s roll a dice and ask the following informal question: What is the chance of getting six as the outcome? It is equivalent to another more formal question: What is the probability of getting a six in rolling a dice? Informal answer: The same as getting any other number most probably. Formal response: 1/6. How do we interpret the calculation of 1/6? Well, it is clear that when you roll a dice, you get a number in the range of {1,2,3,4,5,6}, and you do NOT get any other number. We can call {1,2,3,4,5,6} the outcome space that nothing outside of it may happen. To mathematically define those chances, some universal definitions and rules must be applied, so we all agree with it.

To this aim, it is crucial to know what governs the probability theory. We start with axioms. The definition of an axiom is as follows: “a statement or proposition which is regarded as being established, accepted, or self-evidently true.” Before stepping into the axioms, we should have some preliminary definitions.

Sample and Event Space

Probability theory is mainly associated with random experiments. For a random experiment, we cannot predict with certainty which event may occur. However, the set of all possible outcomes might be known.

After defining the sample space, we should define an event.

Now, let’s discuss some operations on events.

Axioms

Outcomes

Using the axioms, we can conclude some fundamental characteristics as below:

Math Background

To tackle and solve the probability problem, there is always a need to count how many elements available in the event and sample space. Here, we discuss some important counting principles and techniques.

Counting all possible outcomes

It is easy to prove such a principle for its special case. All you need in to count all possible outcomes of two experiments:

The generalized principle of counting can be expressed as below:

Permutation

What is a permutation? Suppose we have three persons called Michael, Bob, and Alice. Assume the three of them stay in a queue. How many possible arrangements we have? Take a look at the arrangements as follows:

As above, you will see permutations. Right? But, we cannot always write all possible situations! We need some math. The intuition behind this problem is that we have three places to fill in a queue when we have three persons. For the first place, we have three choices. For the second place, there are two remaining choices. Finally, there is only one choice left for the last place! So we can extend this conclusion to the experiment that we have n choices. Hence, we get the following number of permutations:

Combination

The combination stands for different combinations of objects from a larger set of objects. For example, assume we have a total number of n objects. With how many ways can we select r objects from that n objects? Let’s get back to the above examples. Assume we have three candidates named Michael, Bob, and Alice, and we only desire to select two candidates. How many different combinations of candidates exist?

Let’s get back to the general question: How many selections we can have if we desire to pick r objects from n objects?

The above definition can be generalized.

Conclusion

In this article, you learned about probability theory, why it is important in Machine Learning, and what are the fundamental concepts. Probability theory is of great importance in Machine Learning since it all deals with uncertainty and predictions. Above, the basics that help you to understand probability concepts and utilizing them. Having any questions? Feel free to ask by commenting below.

Originally published at https://www.machinelearningmindset.com on January 30, 2020.