Sum of two random variables or the rocky path to understanding convolutions of probability distributions

Published in

Analytics Vidhya

6 min readOct 3, 2019

During my master studies working through homework in probability theory I spend a bunch of hours of drinking coffee and struggling with an assignment, which seemed to be quite trivial at the first glance. Here it is:

Given two independent uniformly distributed random variables X and Y, determine the probability density function p(Z=z) of Z=X+Y . Mathematically speaking:

We can say, that the resulting function will have the range [0, 2] since there should be a probability for sampling both x=1 and y=1. But — just adding two uniform distributions won’t give us the right solution to the problem.

After a quick web search I found the theoretical tool, which is required to solve the task: Convolution of probability distributions. It states:

The general formula for the distribution of the sum Z=X+Y of two independent integer-valued (and hence discrete) random variables is

probability density of Z=X+Y given the (discrete) probability distributions of X and Y

The counterpart for independent continuously distributed random variables with density functions 𝑓,𝑔 is

probability density of Z=X+Y given the (continuous) probability densities of X and Y

My first thought was “oh, I saw those equation back in control theory lectures years ago (Laplace transformations)” — unfortunately years went by and I had no idea anymore how the integral

is calculated. Sure, there are some fancy visualizations on pages wikipedia like the following one:

but for me it was hard to decipher the meaning of the different variables within the equation. For example it was not clear to me, why the red function shifts to the left while the t argument is positive.

The goal of this blogpost is to dig through the convolution equation step by step — in the end you should feel comfortable working with convolutions and not be afraid of complex integrals.

Enjoy!

Basics

Altering function graphs

You probably had this chapter in school, teachers (at least in German schools) do that usually with quadratic functions. If you know what happens for function 𝑓 after changing 𝑓(𝑥) to 𝑓(−𝑥+3) feel free to skip this section.

The motivation of this section is to understand the term 𝑔(𝑧−𝑥) which is changed from 𝑔(𝑥).

Note that I replaced the function argument 𝑡 with 𝑥 in order not to confuse the reader with 𝑡 as “time”. We will continue to use x throughout this blog article.

Let’s define two simple functions to work through this tutorial. The first will be a simple quadratic function

As I want to accomplish my homework assignment, too — we will have a uniform distribution U([0, 1]) as our second function

Let’s visualize both in one picture:

Now what happens when we add a constant 𝑧 to the function argument (i.e. 𝑓(𝑥+3))? Let’s see what happens for different values of 𝑧. We will do a single plot for each example function, here for the quadratic:

Adding a positive number z, we shift the graph to the left, with a negative z to the right.

What happens if we multiply −1 to the function argument x? We show this effect on the uniform function, since the quadratic graph is symmetric and will show no change.

Apparently, adding a negative sign to x “mirrors” the function graph relative to the y-axis.

Now we can understand the transformation g(x) to g(z−x). It simply describes adding a constant z (shifting it to the left) and multiplying −1 to x (mirroring it relative to the y-axis).

See the effect on the dotted function graphs below:

Multiplying two functions

Let’s continue to work through the convolution calculation formula — our next task is to understand the product of two functions (or in my case density distributions):

What happens if we multiply them?

Let’s define a proxy function a(x) as a product of both example functions and visualize all three:

The black dotted graph is the product function we integrate on in the next step. Since both rectangle functions do not overlap completely, the resulting graph is not as wide as both source functions.

Integrating over the product function a(x)

An integral is the area between the function graph and the x-axis. This is the last building block of our convolution procedure

The integral over the product of two uniform probability densities f and g (one of them shifted to the right by 0.5) is 0.5.

Wow that was a lot… Let’s summarize everything in one picture!

Collecting the building blocks

Finally we are ready to understand the convolution equation

In order to calculate h(z) for a particular z we have to do following steps:

Shift function g(x) to the left z, i.e. g(x+z)
Mirror the result relative to the y-axis, i.e. g(−x+z)=g(z−x)
Calculate the product a(x), i. e. f(x)g(z−x)=a(x)
Calculate the infinite integral over a(x)=i
The result of the convolution at particular position z is i, i.e. we calculated h(z)=∫ f(x)g(z−x)dx.

We apply this procedure for each z we are interested in, usually it is a range, for calculating a particular probability.

Now having the procedure to calculate convolutions, we can apply it to the initial question (my study assignment):

We calculate and visualize the result for z in range from -1 to 3:

Nice! As expected, the final probability distribution is not a uniform distribution. We see that the sum of two equally distributed random variables will lead to a “triangular” probability density!

Its shape also looks plausible and intuitive, since the expected values of X and Y are both 0.5 (remember, both were uniformly distributed in [0,1]) the expected value for Z has to be 1.

Thank you for your attention and drop your comments below!

For creating the plots I used Python and matplotlib package. Calculation for integrals was done with scipy.integrate.trapz function.

Special thanks to https://github.com/herzog-ch for his review!