# Intro to Poisson Regression

### Poisson Distribution

Poisson Distribution is the discrete probability of count of events which occur randomly in a given interval of time. It is a limiting form of the binomial distribution in which n becomes very large and p is very very small (meaning the number of trials is very large while the probability of occurrences of outcome under observation is small.

**Definition of Poisson Distribution**

*X* =the number of events in a given interval

*λ *= mean number of events per interval

The probability of observing *x* events in a given interval is given by

**Poisson Distribution problem 1**

Births in a hospital occur randomly at an average rate of 1.8 births per hour. What is the probability of observing 4 births in a given hour at the hospital?

Let *X* = Number of births in a given hour

(i) Events occur randomly

(ii) Mean rate *λ* = 1.8

**Poisson Distribution problem 2**

What is the probability of observing more than or equal to 2 births in a given hour at the hospital?

Solutions can be found on page 11 and 13 here http://www.stats.ox.ac.uk/~marchini/teaching/L5/L5.slides.pdf

### Poisson Regression Model

Poisson regression is a form of regression analysis used to model discrete data. It is appropriate when the conditional distributions of Y (count data) given the observed counts are expected to be Poisson distributions.

Poisson regression model is written in terms of the mean response. We assume that there exists a function that relatives the mean of the response to a linear predictor.

First we want to transform our *k-dimensional *input (x1*, *x2, …, xk) into the real number space:

Then we perform another transformation to achieve only positive values on the real number space (as our mean value has to be ≥ 0):

The Poisson regression model:

**Example on Python using Statsmodels**

Credit data and code from: https://github.com/mahat/PoissonRegression

Here we would like to predict the number of awards received by students using the data with the structure below:

Here’s the summary of the data

After fitting the data into Poisson regression, we get the following results and graph:

**Limitations of Poisson Regression Model**

- Heterogeneity in the data — there is more than one process that is generating the data. For example, the data might be collected on more than one group of people, unknowingly
- Overdispersion — when the variance of the fitted model is larger than what is expected by the assumptions (the mean and the variance are equal)
- In Statsmodels, the ratio, Pearson chi2 / Df Residuals, is approximately 1 if the data is drawn from a Poisson distribution with sufficient samples. For observed data, a ratio more than 1 implies overdispersion while less than 1 implies underdispersion. If the data is underdispersed, a zero-inflated model is required
- For our python example above, the Pearson chi2 / Df Residuals is 1.08