Easy explanation of a random variable and fundamentals of a distribution

Pdf, Cdf, Expected value, Variance

SangGyu An
CodeX
7 min readAug 13, 2022

--

Imagine you are surveying how much customers are satisfied with your service. Suppose you sample 10 people, and they can answer either yes or no. When we apply the combination formula, the total number of outcomes you get is 2¹⁰. But isn’t the number too big considering the size of the sample? And when you get 2¹⁰ outcomes with 10 people, how can you process a larger data set when the number of outcomes will increase exponentially?

One thing you should remember about this kind of problem is that each individual is not the main focus here. In situations like this, you care about how many people respond yes or no rather than who responds what. So instead of using the 2¹⁰ outcomes, you can simplify it.

Photo by Brett Jordan on Unsplash

Simplifying sample space: Random variable

From the example above, people generally care about how many yeses you get from the survey. Thus, we can redefine the sample space into what we are interested in.

How a random variable can simplify a sample space

This is called a random variable, and as you can see, the largest benefit you get from this is that you can turn a sample space into a much simpler version. In mathematics, people notate a random variable with a capital letter as shown below.

A random variable X

So the notation above means that when x is given to a random variable X, it gives out a value k. Using the survey example from above, for instance, when k = 1, x would be every combination of 10 people that includes 1 yes.

A closer look on X(x) = 1

Because of its benefit, a random variable is used with every distribution in statistics. And we can divide it into two types: discrete and continuous.

Discrete random variable

As you can tell from its name, when points in a sample space S are either finite or countably infinite, we call it a discrete random variable. So the survey example from the above is a discrete random variable because its sample space is consist of integers 0 to 10. Besides this, random variables in a binomial distribution, geometric distribution, hypergeometric distribution, negative binomial distribution, and poisson distribution are considered discrete random variables.

Continuous random variable

Unlike a discrete random variable’s sample space, a continuous random variable’s sample space contains an uncountable infinite number of outcomes. In other words, a random variable’s outcome is characterized by real numbers. So for instance, when you ask a person to choose any number between 0.0 to 1.0, it involves a continuous random variable. Distributions of exponential, gamma, chi-square, normal, and uniform fall into this category. What each distribution is and how they look like will be the topic of upcoming posts. But before we move on to each distribution, let’s look at the generals first.

Fundamentals of distribution

Probability density function

However, only with those descriptions of a sample space, it’s hard to work with various statistical procedures you will encounter in the future. So you generally work with a random variable’s probability density function or pdf, which describes how probabilities are distributed and follows the requirements below. (Some people call it a probability mass function or pmf for a discrete random variable.) The requirements are the same for both discrete and continuous except for their notations.

Each p() and f() stands for probability function for discrete and continuous

As shown above, every value in a sample space should have a corresponding probability greater than or equal to 0, and the sum of all probabilities must equal 1. And p(s) is used for discrete, whereas continuous is notated with f(t). But you can see some people also using f(s) for discrete, so it’s best to decide which type of random variable it is by understanding the question rather than only looking at the symbols. When we connect this notation with a random variable, it becomes like the notation below.

So for instance, let’s say pX(k) refers to the pdf of the number of heads you get from two coin tosses. Then, pX(1) = P(X=1) becomes the probability of getting 1 head from flipping two coins. Since there are 4 possible outcomes, {HH, HT, TH, TT}, the pdf’s value is 1/2.

Cumulative distribution function

For individual values, you can use the above procedure. But what if you want to calculate probabilities of multiple values such as 1 and 2 heads in the coin example or a range of values from 0 to 0.8 in a continuous case? In situations like this, you use a cumulative distribution function or cdf.

By definition, a cumulative distribution function of random variable X is a function that satisfies the following.

cdf of any random variable X

The notation above means the probability that a value of random variable X is smaller than or equal to any real number t. The general definition and mathematical notation, Fx(t), are the same for both discrete and continuous. Also, since 0 ≤ f(t), cdf for both cases are monotonically nondecreasing functions. However, the general looks of the functions are different.

For a discrete case, because it’s based on either finite or countably infinite sample space, the cdf looks like a step function with an increase occurring at values where its probability is a positive number.

An example of a discrete cdf. It does not increase at 3 because pdf at x = 3 is 0.

On the other hand, cdf looks like a line function for a continuous random variable.

An example of a continuous cdf.

Another difference is how you calculate the cdf. For discrete, you use sigma to calculate the sum of probabilities from one point to another. But for continuous, you use integral. So from a calculus perspective, it’s like calculating the area under the curve.

t from the sigma is undefined, but it starts from the lowest value that corresponding probability isn’t 0

As you can see, the cdf of a continuous random variable can be obtained by taking the integral of the pdf. Even though the formulas above calculate the area to the left of a point, you can calculate the area to the right or the area between two points with a minor adjustment to the formulas.

How to calculate the area to the right and area between two points

The next two things you will look at are the ones that you will encounter endlessly from now on.

Expected value

The first one is an expected value, which is also known as a central tendency, average, or center of gravity. Later, this value will be frequently used to compare the locations of two different probability distributions. The fundamental structure for the discrete and continuous formulas are the same except for their notations. How it plays out differently for different distributions will be explained in the following posts. So at this moment, you just need to know the basic structure of the formula.

Expected value formulas for discrete and continuous

One key point that we always assume about an expected value is that it converges absolutely. It is critical because if this assumption is violated, it means the expected value would depend on the order you add. This should not happen because the expected value is one of the indicators that “represent” a probability distribution.

Variance

The second concept that is crucial to know is variance. As some might be already familiar with the term, it represents how much values are spread out from a mean. So we calculate variance by taking the average of the squared difference between mean and individual points.

Variance formula

The same logic applies to the variance of a random variable by using the expected value.

Variance formulas for random variables

But depending on your situation, it could be easier to use the following equation, where W represents any random variable, discrete or continuous.

Easier variance formula for any random variable

One thing to take into consideration is that we squared a random variable in the calculation of a variance. Thus, units are also squared; however, it’s hard to interpret results when units don’t match. So it’s common to take the square root of the variance to calculate the standard deviation.

Next story : 5 basic discrete distributions you need to know for EDA in data science

Reference

[1] Larsen, Richard J., and Morris L. Marx. Introduction to Mathematical Statistics and Its Applications. 6th ed.

[2] Williams, Thomas. “Random Variables and Probability Distributions.” Encyclopædia Britannica, Encyclopædia Britannica, Inc., https://www.britannica.com/science/statistics/Random-variables-and-probability-distributions.

--

--