Maximum Likelihood Estimation: The Exponential Distribution

A guide on how to derive the estimator for the Exponential model from first principles.

5 min readApr 19, 2023

Introduction

My previous articles on maximum likelihood estimation (MLE) have dealt only with discrete probability distributions. In those, I covered how to derive the estimator for the Binomial and Poisson models. Today, we’re going to examine the exponential distribution which will be the first continuous distribution we’ll consider.

The exponential is a particularly nice starting point because it only has a single rate parameter, λ, which — if you’ve read the post on Poisson MLE — will be familiar. And indeed, the exponential distribution is intimately related to the Poisson distribution. Specifically, the exponential distribution models the time between events in a homogenous Poisson process. So, while the Poisson model examines the time at which events occur, the exponential concerns the inter-event times. But this only applies if the rate of events is constant in time.

The exponential distribution does also crop in other applied areas, but because of its link to Poisson processes, it plays a central role in queuing theory. Here, it is used to model service times; for example, the time it takes a service counter to finish working with a customer. As customers arrive, the time they must wait is dependent on the time it takes to serve those waiting in the queue.

A fundamental property of exponential random variables is that they are memoryless. I won’t be delving into that here, but what this implies is that the current state of the process does not depend upon previous, or historical, states. In the case of waiting times, the time you’ll wait to get served does not depend on how much time you’ve already spent waiting.

As I’ve mentioned in my earlier posts, these posts are a little more technical and I run through things fairly quickly. And if you haven’t read my earlier post, then check out the links under the Related Articles section.

Let’s get started.

The Exponential Model

To motivate these derivations let's suppose we have a set of waiting times X that are exponentially distributed. These could be waiting times observed at a bank, or perhaps a fast food restaurant, though regardless of where data is collected we assume that waiting times are independent and identically distributed (iid).

Under these assumptions, we expect that X has a probability density function (PDF):

The probability density function for the exponential model (image by author).

where x ∈ [0, ∞).

The Exponential Likelihood

Formally, let X be a collection of exponential observations, such that X = {x₁, x₂, x₃, …, xₙ}. Having data in hand, the likelihood function can then be written like so:

and λ is the unknown parameter we wish to estimate. This is fine, but not easy to deal with, and it’s more convenient to work with logs. The next step, then, is to derive the log-likelihood function by log-transforming each side of the equation. If we do that and simplify things, then we get the following expression:

Deriving the log-likelihood for the exponential model (image by author).

This derivation is actually quite straightforward, but if you’re scratching your head a little, here are a few small things you should know. The first is what happens when you take the logarithm of an exponential. Because the logarithm is the inverse of the exponential, it essentially reverses exponentiation and you just get back your exponent:

Exponentials and logarithms (image by author).

This is what is happening in the third line with the second summation term.

The second thing is when we perform sums with constant values. There are two things to note here. In cases where a constant c is summed over n iterations, that is equivalent to multiplying the constant by n:

Summing a constant over n terms (image by author).

This is what is happening in the last line where ln λ is multiplied by n.

Next, if you have a constant term within the summation, but the summation also contains an indexed variable, then the constant can be pulled out of the summation like so:

Constans can be pulled out of summations (image by author).

You can see this in action with the second term in the last line.

This fact does crop up when deriving the log-likelihood for the Poisson model, too — I just forgot to mention it. I also cover some details on working with constants and summations here.

The First Derivative & Solving for λ

Okay, it’s differentiation time. Now that we have an expression for the log-likelihood, we require the first derivative in order to solve for λ. The derivative can be obtained by working through the following steps:

The first derivative of the log-likelihood function (image by author).

In terms of derivatives, this is certainly one of the easier to work with. Anyway, now that we have an expression for the first derivative, we can set it to zero and solve for λ. If we do that, we arrive at the following solution:

Deriving the MLE estimator (image by author).

and so

You might recall that the Poisson estimate for λ was just the sample mean, and what we’re seeing here is that the exponential rate parameter is the reciprocal of the sample mean.

And we’re done!

Thanks for reading!

If you enjoyed this post and would like to stay up to date then please consider following me on Medium. This will ensure you don’t miss out on any new content.

To get unlimited access to all content consider signing up for a Medium subscription.

You can also follow me on Twitter, LinkedIn, or check out my GitHub if that’s more your thing 😉