Conformal Prediction for Forecast Prediction Intervals

Published in

BIP xTech

8 min readApr 16, 2024

Conformal Prediction has been endorsed by many Machine Learning researchers as a State-of-the-Art prediction interval estimation framework, and despite its relatively recent entrance into the scene of prediction interval estimation, it has had many really valuable contributions in terms of published papers from several fields.

In this article, I am going to introduce the reasons for its diffusion, briefly describe the theory behind it, and showcase a Time Series Forecasting example that highlights the advantages of using Conformal Prediction.

The power and the reason of its diffusion of Conformal Prediction stand in three factors:

Its simplicity: Conformal Prediction relies on only one hypothesis⁵: the data points must be exchangeable. Therefore its theoretical guarantee of correctness of the estimated intervals is most likely to hold, differently than other methods that require conditions like the Gaussianity of the data.
Its effectiveness: Conformal Prediction has been shown empirically to outperform other methods for prediction interval estimation.
Its versatility: Conformal Prediction can be applied to unsupervised, supervised, and reinforcement learning. Can be used to deal with distribution shifts automatically in production. It has been successfully applied to problems of Computer Vision, Time Series Forecasting, NLP, and many other fields that involve classification or regression tasks.

Although giving a comprehensive survey of Conformal Prediction is out of the scope of this article, I want to provide you with the very basic theory behind it. Specifically, I am going to discuss the framework of Split Conformal Prediction, the most used version due to its computational feasibility.

The core of Conformal Prediction can be described by this schema¹

The framework takes any heuristically (hence subjectively) calculated probability that describes a certain degree of uncertainty around a prediction and returns a formal, in other words, theoretically valid probability that reflects the objective uncertainty around a prediction.
Formally, Conformal Prediction provides the guarantee that this dis-equation holds for any user-defined probability¹.

Where C(X_test) is the confidence interval associated with the prediction on X_test and Y_test indicates the true value of the target variable associated with X_test.

To obtain this guarantee, the steps of Split Conformal Prediction are the following¹:

From here we can see how versatile this framework is, and the only constraint is for all the X’s to be i.i.d.

The key point of adapting Conformal Prediction to any problem lies in the choice of the score function. In all cases, the score function must be inversely proportionated with the correctness of the prediction. For example, in a classification task, the score function could be 1-f(x),, where f(x) is the softmax probability assigned by the model to the correct label. While for regression the score function can be the absolute error of the predictions.

One important aspect when evaluating the quality of prediction intervals is that they ensure the exact coverage we asked for, meaning that if we want an 80% prediction interval, the intervals must contain roughly 80% of the data. To clarify it with an example, if we provide 80% prediction intervals for 100 test points, we want only 80 points the interval contains the true value. If this percentage is higher it means that we have under-prediction in the model, if it is lower it means that we have over-prediction in the model. In both cases, we have an uncalibrated model, which may be accurate but not reliable in all the use cases we are interested in the uncertainty around each prediction. To evaluate this property the number one metric is coverage, which is simply the fraction of prediction points for which the intervals contain the correct value over the total of prediction points.

Another property we wish to have is the “adaptability” of the prediction intervals, which means that we wish to have tight intervals for easy-to-predict points and wide intervals for hard-to-predict points. This makes sense from an intuitive point of view, and even though the notions of easy and hard are completely subjective, Conformal Prediction aims to identify the true hardship, in other words, uncertainty, associated with each prediction. A metric that can be used to evaluate this aspect is the average wideness of the prediction intervals because given equal coverage for two models, the better models will be the one with the lower average wideness of the prediction intervals.

Note that the evaluation of prediction intervals, as well as the choice of the score function it’s more complicated than what I have described so far, as I presented you with the basic concepts that allow the reader to understand the essence of Conformal Prediction. Therefore if you want to dive deeper to have a more comprehensive knowledge of this framework I encourage you to read the paper I have cited already a few times¹.

Now you have all the tools you need to understand the very friendly practical example that I will explain step-by-step. It is an adaptation of this tutorial², that I have modified in order to show the superiority of Conformal Prediction over traditional methods.

It’s time to get our hands dirty with this nice yet very illustrative example with code of Conformal Prediction applied to Forecast.

Let’s start by importing the necessary libraries.

import pandas as pd
from statsforecast import StatsForecast
from statsforecast.models import SeasonalNaive
from statsforecast.utils import ConformalIntervals

After, we load 8 time series from the M4 competition dataset.

train = pd.read_csv('https://auto-arima-results.s3.amazonaws.com/M4-Hourly.csv')
test = pd.read_csv('https://auto-arima-results.s3.amazonaws.com/M4-Hourly-test.csv').rename(columns={'y': 'y_test'})
n_series = 8
uids = train['unique_id'].unique()[:n_series] # select first n_series of the dataset
train = train.query('unique_id in @uids')
test = test.query('unique_id in @uids')

StatsForecast.plot(train, test, plot_random = False) # plot the time series

Now let’s instantiate the model, a good guess for this easy example is the Seasonal Naive, with seasonality 25.

models = [SeasonalNaive(25)]
sf = StatsForecast(
    models=models,
    freq=1,
    n_jobs=-1)

We shall forecast 48 steps ahead with this model, producing 80% prediction intervals using the traditional method for Naive models³ scaled to obtain the desired probability. Note that this method relies on the Gaussianity of the residuals with mean 0, which may be really far from the truth.

levels = [80] # confidence levels of the prediction intervals
forecasts = sf.forecast(df=train, h=48, level=levels)

Then, we visually inspect the forecast and the prediction intervals.

plot_test = test.merge(forecasts, how='left', on=['unique_id', 'ds'])
train_test = pd.concat([train, test.rename(columns={'y_test':'y'})])
sf.plot(train_test, plot_test, plot_random = False, models=['SeasonalNaive'], level=levels, engine='plotly', max_insample_length=100)

Time series forecasts and prediction intervals

As we can see, the forecast is satisfyingly accurate, but the prediction intervals do not reflect his accuracy, giving a false sense of uncertainty, not to mention the fact that the adaptability is thrown out the window. Let’s take a look at the coverage and the average wideness:

((plot_test['SeasonalNaive-lo-80']<=plot_test['y_test']) & (plot_test['y_test']<=plot_test['SeasonalNaive-hi-80'])).mean()
plot_test.groupby('unique_id').apply(lambda x:x['SeasonalNaive-hi-80']-x['SeasonalNaive-lo-80']).groupby('unique_id').mean()

Average wideness of prediction intervals for each time series

We can see that the intervals are highly under confident, but to evaluate those metrics we need to compare the with the one we will obtain using Conformal Prediction.

Let’s repeat the same experiment using Conformal Prediction. Arrived here you know the drill and the only modification to the code is done when instantiating the model.

models = [
    SeasonalNaive(25, 
    prediction_intervals=ConformalIntervals(h=48, n_windows=30))
]

Conformal Prediction for Time Series is a slightly more involuted process⁴ that implies cross-validation with the forecast horizon equal to the step ahead we wish to forecast. The output is the following:

Time series forecasts and prediction intervals with Conformal Prediction

We can see now that adapts better to the accuracy of the model, providing tight intervals when the forecast mostly overlaps the true values and wider intervals when the forecast is not exactly overlapping with the actual observations. The metrics are:

Average wideness of confidence intervals for each time series (CP)

We can see that the under-prediction has been highly reduced as well as the average wideness of the intervals, hence the calibration and the overall quality of the intervals have improved.

If you are accustomed to forecasting projects you may have noted that usually forecasting models provide prediction intervals which becomes wider as we go further in the future. This may not reflect the true uncertainty around a prediction, especially when your time series has a strong seasonality. Note how the conformal intervals do not necessarily follow this pattern, but they rather reflect the real performance of the model over the forecasting horizon.

Congratulations on arriving this far! I hope you enjoyed the ride as much as the destination. I would appreciate any feedback or questions even if you are not a Data Scientist but just someone who is curious. Conformal Prediction is really a hot promising topic and here I showed not even the tip of tip of the iceberg. With this article, I wanted to provide the readers with a starting point for your journey through Conformal Prediction. If you already know about Conformal Prediction, please feel free to reach out with any type of comment because we all are still learning. If you would like more articles on this topic let me know.

I wish you good learning.

Roberto

References:

[1]: Anastasios N. Angelopoulos and Stephen Bates December 8, 2022 A. Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification

[2]: Probabilistic Forecasting Tutorial with StatsForecast by Nixtla

[3]: Rob J Hyndman and George Athanasopoulos Prediction Intervals. 3.5 Section of Forecasting Principle and Practice 2nd ed.

[4]: Kamile Stankeviciute, Ahmed M. Alaa, Mihaela van der Schaar Conformal Time Series-Forecasting

[5]: Awesome Conformal Prediction by Valery Manokhin

Conformal Prediction for Forecast Prediction Intervals

Written by Robertorusso