Probability and Statistics for Computer Vision 101 — Part 2

Published in

sho.jp

9 min readSep 22, 2018

Welcome to this series of “Towards understanding probability and statistics for computer vision”. This article is going to be Part 2 in the series. If you haven’t seen Part 1 yet, here’s the link:

Towards understanding probability and statistics for Computer Vision — Part 1

I believe understanding fundamental concepts is crucial when it comes to learning something advanced. Why? Because the…

medium.com

My aim here is to give you an overview of the fundamental concepts in probability and statistics for computer vision. Think of this as a guide and not a complete material to learn everything about the topic. It’s very important that you actually study each of these fundamentals using different resources until you fully grasp the essence. This is a guide so that you don’t get lost in the middle and can always come back to link each concept, figuring out which way to go next.

In Part 1, we started off with the very basics of Probability. Here are the topics covered in Part 1:

Probability
Random Variable (RV)
Probability Density Function (PDF)
Joint Probability
Marginalization
Conditional Probability
Bayes’ rule

That being said, in Part 2, we will step a little deeper into the basics of probability and statistics.

Materials covered in this article:

This article is focusing on one of the very important concept: statistical measurements. These are going to be basis for upcoming topics such as Gaussian distribution, so be sure to understand them completely!

Independence
Expectation
Standard deviation
Variance
Skewness
Kurtosis

The materials covered in Part 1 and 2 will be used throughout this series!

Introduction to independence

First, let’s talk about “independence”. Do you remember the joint probability that we talked last time? In short, joint probability is a probability that two or more events occur at the same time. If it is independent, we could treat the joint probability as simple as multiplying with each probability of the events. Do you also remember the conditional probability? Where the probability is determined by the other event? If the events are independent, the conditional probability will no longer be conditioned by others. It will simply become a probability of its own.

To give you an example, let’s say that you wanted to know the probability of Bob winning a tennis match and Amanda winning a speech contest. You are betting with your friends on their win and loss (don’t do it in the real life!). Both of the competitions are held at the same time, same day. The chance of Bob winning according to his stats is 0.35 (=35%). Amanda is so good at her speech and has been winning 5 competitions in a row so let’s say it’s something like 0.95 (=95%). These events are independent as one event wouldn’t affect the other. So if you bet on both winning their competitions, the joint probability would be 0.35 * 0.95 = 0.3325 (=33.25%). On the other hand, if you bet on Bob losing and Amanda winning, the joint probability will be (1 - 0.35) * 0.95 = 0.6175 (=61.75%). So you will decide on betting this combinations… no offense Bob!

You could also refer to the linear independence I talked about in “Towards understanding Linear Algebra — Part 3”

Towards understanding Linear Algebra — Part 3

Before stepping into this article, please note that these stories are a series of “Towards understanding Linear…

medium.com

As usual, I’m embedding external resources that may help you learning the concept. I strongly believe it is important to go through multiple sources and trying to understand each of them, spending some time, summarizing what was written without looking at the content is crucial when it comes to learning the materials.

Independence (probability theory) - Wikipedia

In probability theory, two events are independent, statistically independent, or stochastically independent if the…

en.wikipedia.org

Introduction to expectation

Expected value or expectation is one of the most important fundamental concept in probability and statistics so try to spend some time and effort understanding this section!

Expectation, or expected value, is in short average or mean if you want to think of it in a simple way.

Just to give you an quick example, think of a random variable X and its corresponding probabilities P. The expected value could be represented as the following:

In general, expected value is the summing all the combinations of random variables and the probability density function.

Just to make it more convenient, we often write the expected value using a greek character, μ (mu).

OK, it might not be still clear so let me give you a concrete example.

I hope you got the idea of expectation by now. Let’s take a look at their properties from here. Properties are certain rules that the concept has and it is often useful to understand these so that it makes it easier to understand more advanced concepts (like in later variance, covariance, skewness, etc).

These properties are useful when it comes to dealing with other statistical measurements like variance I’m going to explain just a few scrolls down. Using these properties, we could rewrite the equations in a different format which could sometimes lead to easier calculations. Just like what we did in Bayes’ rule in Part 1.

Expected value - Wikipedia

In probability theory, the expected value of a random variable, intuitively, is the long-run average value of…

en.wikipedia.org

Introduction to variance and covariance

After studying expectation, now is the time to move on to the next exciting topic, variance! Sometimes, this is abbreviated as “Var” or “var” when writing code. Variance is a measure that tells us how spread the data is from the mean. The equation looks something like below:

I’m using the properties of expected value we learned to change the equation in a different form. The first formula I showed surrounded by red is telling us that the variance is where you subtract the mean from a random variable, take squared to account for negativity, and then take the expected value.

If we modify that formula based on the properties of the expectation, what we get is very different. The last equation in the above picture shows us that the variance could also be expressed as subtracting squared version of a normal expectation from expected value of random variable squared.

Variance - Wikipedia

This statement is called the Bienaymé formula and was discovered in 1853. It is often made with the stronger condition…

en.wikipedia.org

Now the “covariance” (often abbreviated as “Cov” or “cov” when writing code). Covariance is actually another very important concept. It comes out now and then whenever you are studying more advanced topic. It’s basically telling how two random variables move along together. If one random variable increase its value and the other also do the same, the covariance is high. If one random variable increase but the other didn’t change, the covariance is low. It might be difficult to understand from these words, so let me visualize after showing you the equations first.

The actual equation for the covariance is covered in a red box. One interesting thing is that (by using expectation properties as we discussed) covariance can be represented as another form like in the 2nd half of the equations, but if random variables X and Y are independent, covariance is 0.

Now let me visualize covariance so that you can get a better idea of what this is representing!

Did you understand what covariance is by now? If not, try taking some time looking for other materials and writing down what you already know.

Covariance - Wikipedia

In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the…

en.wikipedia.org

Introduction to standard deviation

Now let’s talk about another important statistical measurement called “standard deviation”. Sometimes people abbreviate this as “Std” or “std” when writing code. Standard deviation is very similar to variance where it also quantifies the amount of variation of the data. In equation, it is written like this:

One thing to keep in mind is the use of this measurement on distributions that’s not normal. It can still be used, but you may need to be careful that a certain properties that’s very useful when dealing with normally distributed data would not be usable. For example, if your data is normally distributed, 1 standard deviation could account for approximately 68% of the data (look at the figure in the following link. The area in the normal distribution indicated with dark blue). 2 standard deviations for almost 95%. 3 standard deviations covers up to 99.7%. It’s often referred as 68–95–99.7 rule. This useful property is only true for normal distribution. If you are not familiar with different types of distributions, don’t worry! I will cover this topic in the next story.

68-95-99.7 rule - Wikipedia

In statistics, the 68-95-99.7 rule, also known as the empirical rule, is a shorthand used to remember the percentage of…

en.wikipedia.org

Standard deviation - Wikipedia

The standard deviation of a random variable, statistical population, data set, or probability distribution is the…

en.wikipedia.org

Introduction to skewness and kurtosis

Compared to the concepts I explained above, you might not have heard of “skewness” or “kurtosis” as much. But if you actually think of these statistical values as kth moments, it’s actually very clear that skewness and kurtosis is just another statistical measurement.

Here, “σ” is the standard deviation.

First, let’s talk a bit about “skewness”. Skewness is in short, a statistical measure that tells us how the distribution is leaning towards right or left. Let me give you an example visualizing this.

You can remember the meaning of skewness by which way the longer tail is going. As indicated with blue arrow, if the longer tail is going to negative, it’s negative skewness and vice versa.

Skewness - Wikipedia

Skewness in a data series may sometimes be observed not only graphically but by simple inspection of the values. For…

en.wikipedia.org

Now about the “kurtosis”. Kurtosis is a measurement of the sharpness of the distribution. Higher the kurtosis, the more sharp the peak is.

I’m just showing some visualization here, but you could learn more in the link below too.

Kurtosis - Wikipedia

In probability theory and statistics, kurtosis (from Greek: κυρτός, kyrtos or kurtos, meaning "curved, arching") is a…

en.wikipedia.org

Summary

Independence

If two random variables are independent from each other, you could treat its joint probability as its multiplications. Similarly, you could treat conditional probability as its own probability regardless of the condition if they are independent.

Expectation

The most important statistical measurement. You could think of this like taking an average using random variables and its corresponding probabilities.

Standard deviation

A measure to quantify variability of the underlying distribution of the data that you are looking for.

Variance

Another measure to quantify variability of the underlying distribution of the data.

Skewness

How the distribution peak is leaning towards one side (right or left).

Kurtosis

Measurement that tells how sharp the peak is of the data distribution.

I hope that helps! See you next time!

Probability and Statistics for Computer Vision 101 — Part 2

Towards understanding probability and statistics for Computer Vision — Part 1

I believe understanding fundamental concepts is crucial when it comes to learning something advanced. Why? Because the…

Materials covered in this article:

Introduction to independence

Towards understanding Linear Algebra — Part 3

Before stepping into this article, please note that these stories are a series of “Towards understanding Linear…

Independence (probability theory) - Wikipedia

In probability theory, two events are independent, statistically independent, or stochastically independent if the…

Introduction to expectation

Expected value - Wikipedia

In probability theory, the expected value of a random variable, intuitively, is the long-run average value of…

Introduction to variance and covariance

Variance - Wikipedia

This statement is called the Bienaymé formula and was discovered in 1853. It is often made with the stronger condition…

Covariance - Wikipedia

In probability theory and statistics, covariance is a measure of the joint variability of two random variables. If the…

Introduction to standard deviation

68-95-99.7 rule - Wikipedia

In statistics, the 68-95-99.7 rule, also known as the empirical rule, is a shorthand used to remember the percentage of…

Standard deviation - Wikipedia

The standard deviation of a random variable, statistical population, data set, or probability distribution is the…

Introduction to skewness and kurtosis

Skewness - Wikipedia

Skewness in a data series may sometimes be observed not only graphically but by simple inspection of the values. For…

Kurtosis - Wikipedia

In probability theory and statistics, kurtosis (from Greek: κυρτός, kyrtos or kurtos, meaning "curved, arching") is a…

Summary

Written by Sho Nakagome