Extreme Value Theory and the Ultimate Olympic Record

Zaria Rankine
Nov 4 · 5 min read

As its name suggests, Extreme Value Theory provides a class of methods to predict how extreme events could behave. It is used in Structural Engineering, Earth Sciences and City Planning; and with new research constantly emerging it has proven to be an essential resource in Extreme Value Analysis.

In short, EVT can be summarised as a solution to the oversights of Value at Risk (also known as the variance-covariance method).

Value at Risk

VAR is a statistical risk management technique to measure the maximum loss or gain a variable is likely to face within a specified time frame. VAR is most commonly used in financial risk analysis, but is generally applicable to all kinds of risk analysis. VAR assumes a normal distribution, and gives a result as a percentage (often referred to as a confidence level). This percentage can then be converted into a value in units (such as currency).

As a normal distribution is a good measure of central tendency, it can often inadequately represent the possibility of extreme values. In risk assessment, these values are known as Low-Probability High-Impact events. These values could potentially be catastrophic for your model, and EVT seeks to address this. At its core, EVT is a way to predict how extreme your extremities could be.

I found it helpful to liken the principals of EVT to performing Risk Assessment on an investment:

If I use a normal distribution to model my outcomes, it will be useful in predicting how values will fall around my mean value — the average amount I’ll make back on my investment. I will need to prepare for a potential loss on this investment and the VaR is how I measure how likely it is I’ll lose on my investment, how much I could lose, and how quickly I could lose it. However, this kind of model isn’t always accurate at predicting the worst case scenario, just losses that fall under ‘normal’. Being the Nervous-Nelly I am, I need to know the absolute minimum I could get back on this investment, should everything go wrong.

I found two modelling methods under the umbrella of Extreme Value Theory: the Peak Over Threshold Method (POT), and the Block Maxima method.

Peak over Threshold

A threshold is decided by the statistician, and all values above (or below) this are considered extreme. These are the values selected to be modelled.

Block Maxima

The data is split into intervals, the size of which are decided by the statistician. The most extreme value from each interval (or ‘block’, hence the name) is taken. The most extreme value will be either the smallest or largest value in the block, depending on the statistician’s goal.

These methods have proven useful in many avenues, although they do come with their own setbacks. There are no standardised way of deciding the size of the blocks when using the Block Maxima method, similarly to there being no standard threshold when using POT method. These mean the statistician will have to use their best judgement to decide for themselves where the line between ‘normal’ and ‘extreme’ will lie; a value too low will result in a large variance; a too high number of order statistics may cause a large bias.

One of the main challenges faced in extreme value analysis is the lack of data that is available. Only modelling a small sample of data can have it’s setbacks; it can lead to over-generalisations, or models that are models that only work well in specific instances. Given that EVT focuses only on the most extreme values, we want models that are only applicable to rare and extreme cases. Furthermore, given we are trying to calculate extreme data, we are somewhat trying to over-generalise as much as possible, whilst still giving an accurate insight of the data.

The Ultimate 100m Sprint

A research paper was published in 2011, aiming to establish the fastest possible sprint time an athlete could complete. The authors acknowledge previous research in this area, noting that most research concerning ultimate world records considers the development of the world record over time and extrapolates the trend to the future. In other words, the estimated ultimate world record tells us what could be achieved ‘tomorrow’, not what could happen in 500 years from now. Smeet and Einmahl aim to remedy this, with the use of EVT.

The paper explores how quickly a 100m sprint could be run by both men and women, using the fastest personal best times set between January 1991 and June 2008.

Suppose that X1, X2, …, Xn (the speeds) are n i.i.d. observations with continuous distribution function F. Let X1:n ≤ X2:n ≤···≤ Xn:n denote the corresponding order statistics. Suppose that there exist a sequence an > 0 and a sequence bn such that the maximum Xn:n, scaled and centered by an and bn, converges in distribution to a non-degenerate distribution G: for all continuity points x of G,

Smeet and Einmahl found that 9.51 seconds for the 100-m men and 10.33 seconds for the women, would be the Ultimate 100m Record.

I thought it would be interesting to repeat Smeet and Einmahl’s methods at a later date, given the new data available. The men’s World Record was broken again in 2009, a time of 9.58 seconds. Now that we are closer to our most-extreme possible value, I thought it would be interesting to see how recent scores would fit into their distribution, and to see how quickly our definition of ‘extreme’ can change.

References:

Extreme Value Theory (EVT) Application on Estimating the Distribution of Maxima — F. A. Ramadhani, S. Nurrohmaha, and M. Novita (2017)

The Modelling of Extreme Events — D. E. A. Sanders (2005)

Evaluation of Peaks-Over-Threshold Method — Soheil Saeed Far and Ahmad Khairi Abd. Wahab (2016)

Ultimate 100m world records through extreme-value theory — Einmahl, John; Smeets, S.G.W.R. (2011)

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade