Predicting the probability of mass shootings with extreme value theory

5 min readFeb 20, 2023

I recently find myself addicted to Manifold Markets, a prediction market platform which allows you to create simple yes/no propositional bets (with monopoly money) on basically any topic. It seems to be intellectually co-located with the Slate Star Codex / Effective Altruist / Longtermism crowd, and the propositions reflect this. However if you scroll past the questions about whether AI will take over humanity in 2100, or the paradoxical self referencing markets, there are some interesting topics. One such was “Will a single shooting incident kill at least 20 people in the US during 2023?”.

It’s a macabre market to be sure, but one which highlights the US’s unique problem with mass shootings. To address this quantitatively I downloaded the mass shootings data from wikipedia, and cobbled together a python script to clean up the data. To try and be more predictive of 2023, I think its best to only look at the data from 2010 onwards.

The most straightforward way to try and estimate the probability would be to fit a sensible distribution to the data, and consider the probability mass above 20 (let’s call this P). Then you look for the average number of incidents in a year (let’s call this I) and you can compute the probability of getting 0 incidents. This is computed as (1-P)^I. So the probability of the final outcome is 1- (1-P)^I. Fitting a lognormal distribution gives a final result of 25% for the outcome.

However the above approach leaves something to be desired:

The result is very sensitive to the distribution chosen. For example choosing a gamma distribution will give a tiny probability of the outcome occurring.
The distribution is fit to the most common values, but we are interested in domains where the values are extreme.

Luckily this is where extreme value theory comes in. If we just care about extreme values, we can fit a specialised distribution to just the extreme values. But how do we define an extreme value? There are 2 approaches.

Define the extreme value to be a maximum in some time interval (often chosen to be a year). Called the “block maxima” method.
Define the extreme value to be a value above a certain threshold. Called the Peak Over Threshold method.

The magical thing about the above two definitions of extreme values, is that in each case whatever the underlying distribution of the variable, the extreme value is distributed according to a simple distribution with 3 parameters! Specifically:

The block maxima extreme values are distributed according to the Generalized Extreme Value Distribution (in the limit of blocks with a very large number of samples)
The Peak Over Threshold extreme values are distributed according to the Generalized Pareto Distribution (In the limit of values becoming very large)

There’s probably some very fancy mathematical theorem about the duality of these.

Anyway, the big question with both of these methods is how many data points you consider as “extreme”. The more data points you include then the better fit your GEV / GP Distribution will be, but it is less likely that the limiting conditions which these distributions require will be valid.

This is the section which is the most hand wavy, but I have seen that people use around 10 data points as a compromise between sample size and distribution accuracy.

Considering the block maxima method then, if we look at the yearly maximum of deaths arising from a mass shooting incident we get:

And fitting the GEV distribution to this gives:

This gives the outcome probability for > 20 in one year as 30%.

In considering the Peaks Over Threshold method, we have to think about how we choose our threshold. The obvious choice is 20, given as that is what the question asks. However that only gives us 6 data points to work with! The downside of a low number of data points is that it increases the uncertainty of the actual interval between the peaks. We could perhaps do a little better with a threshold of 12, which gives us 11 data points to work with.

Fitting the distribution gives

And a probability of exceeding 20 in a peak of 44%. However to convert it into a probability that in a year we we exceed 20, we need to work out the probability of peaks happening in a year. To do this we assume they are Poisson distributed, and since we have 11 events in a 12 year span (from 2010 to 2022) then this gives a mean of 0.9. Crunching through the maths gives a probability of 60% that at least one peak will happen in a year. So the actual probability is more like 44% x 60% = 26%. This is slightly inaccurate, as if 2 events happen you get two chances to exceed 20 so this probability should be a little higher. But you get the jist.

These 2 methods are already used in hydrology to calculate risks for “100 year” floods, and in many other fields where you care about freak occurrences. Given how poorly humans are at judging risk especially at the tail end of the distribution, its nice to have a simple statistically grounded technique which can be applied in such diverse situations.

Predicting the probability of mass shootings with extreme value theory

Written by Jem Bishop