A Measure Theoretic Approach to Probability (Part 3) — Expectation and Lebesgue Integrals

Published in

Quantaphy

8 min readDec 14, 2023

Welcome back to the last article in this trilogy. In the first part, we laid the foundation for measure-theoretic probability. We explored the notions of measures and measurable spaces and, using these concepts, we defined a probability space. In the second part, we used fundamental concepts in measure theory to define measurable functions and saw the unification of discrete and continuous random variables through a common measure theoretic representation that made our treatment of probability more rigorous. In this article, we will cover the final fundamental concept of probability: mathematical expectation.

To do this, let’s first understand what ‘expected value’ even means. The expected value of a random variable, intuitively, is the ‘average’ value you would expect to see if you repeatedly observed the value of the random variable. For instance, if we roll a dice a million or a trillion times, what would the average value be? Well, theoretically, it should be 3.5 and as the number of rolls tends to infinity, our average tends closer and closer to 3.5. We denote the expected value of a random variable, X, by E(X). And it is taught, in introductory probability courses, that:

If we understand what each term in these formulae mean, we see that the intuition behind both cases are the same, i.e., the expectation is simply a weighted average of all the values of the random variable. And the weights correspond to the probability of each value.

But how can we unify these two definitions? Well, that’s where measure theory lends itself. If you recall, measure theory defines random variables as measurable functions from probability spaces to measurable spaces. Then, the expectation of a random variable, E(X), is this Lebesgue Integral:

But what is a Lebesgue integral? This integral looks peculiar and nothing like the Riemann integral. Well, this is were things get tricky and require some careful consideration.

The Lebesgue integral, but intuitively

An integral, for a given function with a numeric domain and codomain, measures the ‘area under the curve’. And if this function is smooth, i.e., continuous and differentiable, then the integral can be approximated as a sum of rectangles. If we keep shrinking the rectangles then, in the limit, this becomes the Reimann integral — what we’re familiar with.

A visual representation of Riemann sums. Credit: Matthew N. Bernstein

Similarly, the Lebesgue integral also forms rectangles, but it does so in a different way. Matthew Bernstein does a great job at explaining this: “A rectangle is formed for each value in the function’s codomain (i.e., each unique height that the function ever reaches). For each value, a rectangle is formed with a height equal to this value and a width equal to the length of all intervals along ℝ where the function reaches this height”. A figure explaining this process is shown:

A visual representation of Lebesgue integrals. Credit: Matthew N. Bernstein.

More formally, Folland (1984) summarizes the difference between the Riemann and Lebesgue approaches as follows “to compute the Riemann integral of f, one partitions the domain [a, b] into subintervals”, while in the Lebesgue integral, “one is in effect partitioning the range of f.” This is a great way to intuitively understand the difference between the two integrals.

An important caveat to note is that the Lebesgue integral also works for functions whose domain is non-numeric. And this will become evident once we define the integral rigorously.

The Lebesgue Integral, but rigorously this time

Now while the full rigorous definition of the Lebesgue integral is quite complex and demands heavy machinery, it is possible for us to define it in a slightly less rigorous but easier-to-understand way. And for this, we break the definition into four parts:

Simple functions
Lebesgue integral of a measurable simple functions
Lebesgue integral of a measurable positive function
Lebesgue integral

Simple functions

We define a simple function as follows:

From this definition, it is fairly apparent that when the domain of the simple function is ℝ, then it is just a step-function:

Graph of a simple function. Image credits: Matthew Bernstein.

Simple functions generalize step functions because the domain need not be numerical— rather, it must simply have a defined σ-algebra.

This forms the basis of understanding the Lebesgue integral.

Lebesgue integral of a measurable simple function

Now, we are ready to define the Lebesgue integral. However, this is a narrow definition as we’re defining it only for measurable simple functions — that is, one type of a Lebesgue integral. But, as will be evident soon, this is important in defining the Lebesgue integral in general as it helps in calculating the areas of the rectangles that we segment the area under the function into. What follows is the definition for the Lebesgue integral of a measurable simple function:

Like most mathematical definitions, there’s a lot going on here. So, let’s break it down. First, we see that the measurable function f is a simple function. Why? Because the codomain of f, H, has a σ-algebra defined for it, ℋ. This σ-algebra is actually quite trivial because since H is finite and countable, each element of H has its own singleton set in ℋ. And the preimage of each singleton set, Aᵢ, has measure µ(Aᵢ).

And if the codomain, H, is the set of real numbers ℝ, then the preimages would be intervals on the real line and the measures of these preimages would simply be the lengths of the respective intervals. Then, the Lebesgue integral for the simple functions can be interpreted as a sum of rectangles:

Lebesgue integral of a simple function. Image credits: Matthew Bernstein

Lebesgue integral of a measurable positive function

Now, we are in a position to begin to define the Lebesgue integral in a more general case — however, we do it only for positive-valued functions. While this isn’t the final, complete definition, it is the last step in understanding it.

We define the Lebesgue integral of a positive-valued function in the following way:

This definition is very convoluted, so let’s break it down and understand what’s going on. We first notice the use of a supremum in the definition. Before we get into the meaning of this, let’s consider the set itself:

This is a set of integrals of simple functions that are bounded from above by the function of the integral. We can represent this graphically to understand it better:

Graphical representation of a simple function bounded from above by the function of the integral. Image credits: Matthew Bernstein

Now, the next part of this involves an intuition very similar to Riemannian integrals. Since g is always bounded from above by f, we can make g more and more precise and the intervals with greater and greater accuracy. This is very similar to decreasing the width of rectangles in the treatment of Riemannian integrals. As g gets more and more precise, our approximation of the area under f gets better. And when we use the supremum in the definition, it works to make these intervals infinitesimally precise and our approximation infinitesimally accurate.

It is important to note that this only works if f is positive. If we let the function take on negative values, then it would be difficult to find a supremum of simple functions that approach f.

The Lebesgue integral, completely

While if you read the general definition of the Lebesgue integral with no context, it will be very complex and will require a good grasp of mathematics. However, in the way that we’ve done it, we’ve broken the definition down to make it much easier to understand. If you followed all the way along, then this final bit will not be much different. So, we define the general Lebesgue integral in the following way:

What’s going on here is quite simple. We are only splitting the function into it’s positive and negative branches. Then, we can treat the negative branch as a positive-valued function itself — we know that the Lebesgue integral for this exists. Then, we only have to subtract this area from our considerations to make it ‘negative area’. With this, the LHS in the definition is an exact representation of the total area bounded by the function and the axis.

An image assisting the understanding of the general definition of the Lebesgue integral. Image credits: Matthew Bernstein

Mathematical expectation

How do we tie all this back to expectation? Well, we can see that the expectation of a random variable is simply the Lebesgue integral of the random variable with respect to its probability measure over the probability space:

Borrowing techniques from higher mathematics, it is possible to show that this reduces to the familiar definition of expectation for discrete and continuous random variables.

With this, we have reached the conclusion. In this article, we defined the Lebesgue integral and used it to understand the rigorous definition for mathematical expectation. Our definition of the Lebesgue integral involved a breaking-down of the definition into simpler and more ‘understandable’ bits. Nonetheless, it is just as legitimate as the purist definition. Anyway, with this, we have completed the trilogy and our treatment of probability through a measure theoretic lens has reached an end.

Thank you for reading and I hope you have a great day! Stay tuned for more mathematics :)