Maximum Likelihood (Simplified)

5 min readFeb 24, 2023

Maximum likelihood is an important principle in probability theory that can be used to estimate the unknown parameters of a probability distribution given a data set. This principle is also used to derive the objective functions of various machine learning models.

Many of my students have a hard time understanding this concept when they hear about it for the first time, so I have decided to write a short article that explains this concept in simple terms and provide a few examples.

Assume that we have a set of n data points denoted by X = {x₁, x₂, … , xₙ}, which are generated from some probability distribution P(X; θ) with unknown parameters θ. For example, the data points might be drawn from a Gaussian (normal) distribution, where the parameters θ are the mean μ and the standard deviation σ of the distribution.

We also assume that the points are identically and independently distributed (iid for short), which means that they are all sampled from the same distribution (identically), and all the points are mutually independent (independently).

Our goal is to find a model (represented by θ) that make the observed data most probable, or in other words a model that maximizes the likelihood of obtaining the data points X if we were sampling them from the distribution P. This process is often referred to as maximum likelihood estimation (MLE).

Formally, the likelihood of the model (represented by θ) is defined as the probability of obtaining the observed data…

Maximum Likelihood (Simplified)

Written by Dr. Roi Yehoshua