Maximum Likelihood Estimation: The Poisson Distribution

A guide on how to derive the estimator for the Poisson model.

Rob Taylor, PhD
6 min readApr 11, 2023
Photo by Annie Spratt on Unsplash

Introduction

In my previous posts, I introduced the idea behind maximum likelihood estimation (MLE) and how to derive the estimator for the Binomial model. This post adds to those earlier discussions and will step you through how to derive the estimator for the Poisson model.

The Poisson model is fundamentally a counting process that enumerates events occurring within time or space. It is applied to a wide variety of phenomena and is typically used when modeling rare events. Though, coming from a psychology and neuroscience background the Poisson model is most familiar to me because it is used to characterize the spiking behavior of cortical neurons, thereby providing a mathematical framework for their analysis.

In working through the derivations, I’ll follow a similar format to my earlier articles by first introducing the likelihood function — i.e., the data-generating model — and then step through how to derive the maximum likelihood estimator from first principles. Fair warning, this post will be a little technical and does require you to be familiar with some calculus and algebra. If you haven’t perused my earlier post, then I suggest you check out the links under the Related Articles section.

Let’s get started.

The Poisson Model

Let K be a Poisson random variable corresponding to a count of events observed during a fixed interval of time. The model does also work for spatial data, but I’m just going to focus on events in time here. For example, this could be the number of goals scored per match during a football season.

We do need to make some assumptions, however. First, we’ll assume that the data consist of independent counts; that is, the counts made during one interval of time are not affected by counts made during a separate, non-overlapping, time interval. Second, we’ll assume that the rate that events occur — denoted by the parameter λ — is constant in time. This is called a homogenous Poisson process.

With these assumptions in place, the Poisson distribution is defined using the following probability mass function (PMF):

The Poisson probability mass function (image by author).

where k = {0, 1, 2, 3,…}. A defining characteristic of the Poisson model is that its expected value and variance are both equal to the rate parameter λ. The ratio of these values is called the Fano Factor and is therefore equal to 1 for Poisson processes.

The Poisson Likelihood

Suppose we have collected some data — this could be anything, such as arrivals at a restaurant, the number of calls coming into a call center, or perhaps even the number of Prussian soldiers killed by horse kicks. In either case, we have a series of counts that comprise our data. With this random sample K = {k₁, k₂, k₃, …,kₙ} in hand, we can now define the Poisson likelihood function:

Likelihood function for the Poisson model (image by author).

where the rate parameter λ is what we want to estimate.

Now, if you’ve read through my earlier posts, you’ll be familiar with what comes next. If this is your first time, strap yourself in. We next need to derive the log-likelihood function by taking the log of both sides (and simplifying the expression). We can step through the derivation as follows:

Deriving the Poisson log-likelihood function (image by author).

In the last step, I’ve just rearranged the terms slightly. To me, this makes things neater. Note also that we don’t actually need the last term because it doesn’t include λ. As we’ll see shortly, this term is eliminated when we differentiate the log-likelihood function and can be removed from the expression altogether — if you wish.

If all of this seems like jibberish to you there are a few key concepts you should probably get familiar with, some of which I cover here. But quickly, here’s what you need to know. First, the log of the product of terms is equal to the sum of their logs:

The first thing you need to know (image by author).

Second, the log of a quotient is equal to the difference between their logs:

The second thing you need to know (image by author).

And third, the log of an exponential is equal to the log of the base multiplied by the exponent:

The third thing you need to know (image by author).

Crystal?

The First Derivative & Solving for λ

The next step is to take the first derivative of the log-likelihood function with respect to λ. Given the Poisson distribution is only a single-parameter model we don’t have to faff around with partial derivatives, which certainly makes things more straightforward.

Now, I’m not going to bang on about inflection points and concavity — though these are important — but the point at which the derivative goes to zero is the thing we’re most interested in because this is where our estimate is likely to be. So, let’s differentiate the log-likelihood function:

The first derivative of the Poisson log-likelihood function (image by author).

See how the third term in the log-likelihood function reduces to zero in the third line — I told you that would happen. I’ll just note here that this derivation simply applies a whole bunch of derivative rules, which I again cover here.

Now that we have an expression for the first derivative, the next step is to set this to zero so we can solve for λ. So let's do that — and with a little bit of algebra, we find that:

Deriving the maximum likelihood estimator (image by author).

which yields our maximum likelihood estimator for λ:

The final product (image by author).

And that’s it!

Here we can see that the estimator that maximizes the likelihood of the data is just the arithmetic mean of the observed counts.

I won’t go through the process of assessing the second derivative to determine that a global maximum has been obtained, but this is generally good practice. However, for simple models like these, you can get away with not checking.

Thanks for reading!

If you enjoyed this post and would like to stay up to date then please consider following me on Medium. This will ensure you don’t miss out on any new content.

To get unlimited access to all content consider signing up for a Medium subscription.

You can also follow me on Twitter, LinkedIn, or check out my GitHub if that’s more your thing 😉

--

--

Rob Taylor, PhD

Former academic, now data professional and writer. Writing about statistics, data science, and analytics. Machine learning enthusiast and avid football fan.