A Measure Theoretic Approach to Probability (Part 2)

Ansh Pincha
Quantaphy
Published in
6 min readDec 4, 2023

Welcome back to the second part of this trilogy! In the first part, we laid the foundation for measure-theoretic probability. We explored the notions of measures and measurable spaces and, using these concepts, we defined a probability space. In this article, we use measure theory to understand random variables.

As a small recap, in part one, we saw that a probability space can be defined using measure theory in the following way:

Now, we extend our considerations to random variables. In schools, a random variables is usually introduced as a variables whose value is random. For instance, the outcome of the roll of a dice can be modeled by a random variable X whose value is randomly either 1, 2, 3, 4, 5, or 6. Now while this definition works for basic applications of probability, it is not at all rigorous and misses out on some very satisfying intuition.

Measurable Functions

So, we now turn to measure theory to define a random variable. And in order to do this, we must define a measurable function:

Let’s break this definition down. Firstly, like any other function, a measurable function maps elements from one set onto another set. But that’s not all, there are more dimensions to this function. Both the domain and codomain of the function f, are measurable spaces equipped with σ-algebras ℱ and ℳ respectively. And, most importantly, a measurable function can ‘transport’ the measure from the domain’s measurable space to the codomain’s measurable space. What does this mean? Suppose the measurable space (F, ℱ ) has a measure µ. Then, we can apply f, to get a measure for the measurable space (M, ℳ). How? Well,

And, by the way we’ve defined a measurable function, f⁻¹(A) certainly belongs to the σ-algebra of F, and thus, can be assigned by the measure µ.

Image credits: Matthew N. Bernstein

Part A of this figure depicts two measurable spaces (F,ℱ) and (H,ℋ). The σ-algebras are generated by the sets outlined in black lines. Part B depicts a valid measurable function f mapping F to H. That is, the left set is the domain and the right set is the codomain. Colors illustrate image relations between subsets of F and H under f. For example, the image of the blue set in F is the blue set in H. We see that each member of has a measurable preimage. Part C depicts a non-measurable function. This function is non-measurable because the blue set in has a preimage that is not a member of ℱ.

Random Variables

Now that we’ve defined measurable functions, we can begin treating random variables. Using measure theory, we define a random variable in the following way:

What does this say? Well, simply put, it says that a random variable is a function that maps elements from a probability space onto a measurable space. If you recall, the set Ω, called the sample space, represents all conceivable futures. A random variable X, simply maps each conceivable future to an element in some set F. The set F is the set of all possible values that X can take on. A random variable is a measurable function from a probability space in that it allows us to ‘transport’ the probability measure from the probability space to the set of outcomes that we are considering for X.

Discrete Random Variables

To illustrate this, we consider coin tosses. Let Y be a random variable that represents the outcome of a toss of a fair coin. Then, the set Ω represents all possible future — the infinite ways of the coin spinning through the air, landing, bouncing, etc. And the random variables maps each of these futures onto a measurable space (H, ℋ) where H:={0,1}. Here, we encode tails as 0 and heads as 1. For example, there can be two ways a and b in which the coin flips through the air and lands as heads. Then X(a)=1 and X(b)=1.

The σ-algebra over H denotes all groups of outcomes that we wish to assign a probability to:

The important thing to note here is that each element of has a preimage under X in the original probability space, i.e., the preimage is a member of E. So, we are able to assign each set in a probability according to the measure of its preimage according to P:

And this is simply, in familiar notation, P(X=1).

Continuous Random Variables

Now, we turn to continuous random variables. This has a slightly different approach, because, as will be fairly apparent, if we espouse the same approach as we did with discrete random variables, we will reach a mathematical contradiction.

A continuous random variable also maps elements from a set Ω to a set H. But in this case, H is the set of all reals. That is,

The problem now, is that we cannot have a σ-algebra in the same sense as we did with discrete random variables. From the definition of a measurable function, we need to construct the σ-algebra over ℝ such that the preimage of every element in is an event in E. However, we cannot assign a non-zero probability to the every element in ℝ since the cardinality of the set is infinity, i.e., it is an uncountable infinite set. And any attempt to assign a probability to every element of the set would lead to the probability of the σ-algebra to be infinity — this is a contradiction since the probability of any event cannot be greater than 1.

To avoid this issue, we turn to Borel σ-algebra. This in itself is an extensively deep topic that requires a lot of topological knowledge, so we will not delve into it in this article. But intuitively, a Borel σ-algebra treats all intervals on the real line instead of the real line itself. That is, the interval (x,y) on the real line is an element of ℋ and therefore has a measurable preimage under X. And, we assign all intervals of length zero, i.e., singleton sets that contain only one real number, a probability of 0. That is, the probability assigned to any specific real number is zero. However, the probability assigned to an interval of real numbers may be non-zero.

Now, how do we compute the measure of the preimage of an interval in ℋ ? Most often, this is achieved through the use of a probability density function — a familiar concept in probability. And this is defined in the following way:

Generally, the LHS is denoted as P(a < X < b).

With this, we have now unified the concepts of discrete and continuous random variables. Hopefully this provided some satisfying intution to the counter-intutive monster that is probability theory. And, I should say, that measure theory is not used only to unify these concepts. In fact, in defining random variables in this way, we have now equipped ourselves with the machinery that is necessary to work with random variables that work with non-numeric outcomes, i.e., vectors, sets, and functions.

The final article in this trilogy will explore how measure theory can be used to understand mathematical expectation.

Thank you for reading and I hope you have a great day!

--

--

Ansh Pincha
Quantaphy

High-school maths enthusiast. I particularly enjoy (prime) number theory, probability and analysis.