Analytics Vidhya
Published in

Analytics Vidhya

Beginners Guide to Probability and Statistics for Data Science

Ever wondered why learning and understanding mathematical concepts like Probability and Statistics is important for a Data scientist?

Picture from Unsplash

Ever wondered why learning and understanding mathematical concepts like Probability and Statistics is important for a Data scientist?

A lot of Data science work is usually perceived to be just all about programming, but almost everything that data scientists do, involves working with statistics and making predictions. When predictions are made, that is probability and this is the foundation for even the most advanced predictive models.

Most Machine learning and Deep Learning models use probability concepts internally. If you understand Probability theory then your understanding of Machine Learning and Deep learning will be grounded. So in this article I will help you understand some basic concepts in probability and statistics .

  1. Probability

Probability is the numerical representation of the likelihood of an event. Probability is a number that lies between 0 and 1. The closer the number is to 1, the most likely it will occur and the closer it is to zero, the less likely it would be. We can say that a probability of 0.9 has a higher probability of occurrence and 0.1 has a lower chance of occurrence.

An event can be referred to as the outcome of a random experiment. A random experiment is an unbiased experiment that has certain outcomes associated with it, for example tossing a coin. There are two different outcomes with it, you either get a Head or a Tail. Head and tail are two different events, so tossing a coin is a random experiment. Another example is rolling a dice. A dice has six possible outcomes(1,2,3,4,5,6). So rolling the dice is a random experiment and those numbers are possible outcomes.

Photo from

How is Probability Calculated?

Probability of an event is calculated as:

P(event) = n(event)/n(samples)

Back to our coin example, when you toss a coin and we want to discover the probability of getting a head, we can say that

sample = {Head, Tail}=2

event = {Head}=1

P(Head) = n(1)/n(2) =1/2

So the probability of getting a head equals ½, if we use the example of the dice, we want to get the probability of getting an odd number when we roll a dice, we can say


Odd= {1,3,5}

P(Odd ) = n(event)/n(sample) = 3/6= 1/2

2. Odds Ratio(OR)

This is a statistical concept that qualifies the strength of the association between two events A and B. Odds Ratio is defined as the probability in favor of an event divided by the probability against the event;

OR = P/1-P

The result of this becomes the Odds Ratio for events for which probability is P. For example, if 0.5 is the probability of getting a head when we toss a coin then odds in favor of getting a head will be

OR = 0.5/1–0.5=1

Odds Ratio is key in understanding the mathematical logic behind logistic Regression. One of the most popular Machine Learning algorithms.

3. Independent Events

These are events which occur freely of each other. In other words, the occurrence of one event does not affect the occurrence of the other. For example, the rolling of the dice and tossing of a coin are two independent events. They do not depend on each other or affect each other in any way.

4. Dependent Events

A dependent event is affected by the outcome of a second event. For example, getting a degree is dependent on you passing through college. You most likely won’t have a degree if you don’t pass through college.

5. Mutually Exclusive Events

Mutually exclusive events are two events which cannot occur together. An example is watching a movie in a cinema and also shopping in a store at the same time. Because you can’t do both activities at the same time, they are mutually exclusive events.

6. Joint Probability

This is the combination of two events. For example, while rolling a dice, what is the probability that the number on the dice is even and while tossing a coin, what is the probability that you will get a tail?

Even numbers={2,4,6} We have 3 even numbers out of 6.

Event A (getting an even number) =3/6=1/2

Event B(getting a tail) =1/2

Joint Probability for the 2 independent events are given as

P(AB) = P(A) x P(B)= 1/2 x 1/2 = 1

7. Conditional Probability:

This is the probability of one event occurring, given that another event occurs. For example, event A is getting an even number for rolling a dice, while event B is getting a number greater than 4. The Conditional Probability is shown as

P(A/B) which is the probability of A happening given that B has already occurred

P(A/B) =P(A n B)/ P(B) = 1/6 2/6 =½

Actually, there is a different way to calculate conditional probability. The conditional probability of an event can be calculated using the other conditional probability. For example:

  • P(A|B) = P(B|A) * P(A) / P(B)

The reverse also stands; for example:

  • P(B|A) = P(A|B) * P(B) / P(A)

This method is useful when the joint probability is challenging to calculate (which is most of the time), or when the reverse conditional probability is available or easy to calculate. This alternate method of the conditional probability is referred to as Bayes Theorem, a famous theorem in Machine Learning.

Picture from Pixabay

8. Statistics

According to Wikipedia, Statistics is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. It includes every aspect of data , even planning how the data will be collected using surveys or experiments . Usually, when the population data cannot be accessed, samples are used. These samples are representative of the whole population.

There are two statistical methods used in analyzing data. Descriptive statistics and Inferential statistics. In descriptive statistics, a sample data is summarized or describes using indexes like mean or variance while with inferential statistics, we draw conclusions from the data. The mean index seeks to characterize the central tendency of the distribution(sample or population). The variance (dispersion) seeks to characterize the extent to which each member of the distribution vary or depart from its center and from each other.

9. Mathematical statistics

This is a key subset of the discipline of statistics. Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena.

With this article, I have attempted to bring explain basic but important mathematical concepts that would aid your knowledge of data science and machine learning. I hope you have seen how learning about probability helps you in making informed decisions about likelihood of events, based on a pattern of collected data and how statistical inferences are often used to analyze or predict trends from data, and these inferences use probability.

Let me know if you learnt a thing or two in the comments below. Don’t forget to leave 10 claps 👏🏽👏🏽👏🏽 or more.



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store