Predict CO2 emissions by vehicles with Conformal Prediction on Regression Task

Claudio Giorgio Giancaterino
10 min readAug 27, 2024

--

One important factor in ensuring the longevity of our planet and all living beings is the reduction of CO2 emissions, and we can make better predictions using Conformal Prediction.

CO2 Emission by Vehicles (kaggle.com)

Understanding Uncertainty Quantification

Now, imagine you want to sell your old car, and you use an online tool that predicts how much money you can get for it.

https://deepai.org/machine-learning-model/text2img (second hand car to sell)

Why Uncertainty Quantification is Important?
Let’s say the tool predicts that you can sell your car for 5,000 $, but also tells you there is an uncertainty of +/- 1,000 $. This means the actual price could be anywhere from 4,000 $ to 6,000 $.
Knowing this range helps you set a realistic expectation and maybe negotiate better when selling your car.
If you’re using the prediction to make a financial decision, like buying a new car or planning your budget, understanding the uncertainty can help you manage risks. If the predicted price has a high uncertainty,
you might wait for a better market before selling. On the other hand, a low uncertainty (e.g., ± 200 $) means you can be fairly sure about the price, making your financial planning easier and less risky.
In both cases, uncertainty quantification isn’t just about getting a number; it’s about understanding how reliable that number is and making better decisions based on that understanding.

Machine learning models are increasingly being used to make predictions and inform decisions across various domains. However, to fully trust these predictions, it is crucial to understand how certain or uncertain they are. This is where uncertainty quantification (UQ) comes into play. UQ is essential for several reasons, including decision-making, creating robust systems, automating tasks, and communicating with stakeholders.

To make UQ actionable, it is important to create prediction regions that are efficient, adaptive, and valid. Efficient prediction regions are those that are as small as possible while still containing the true value with high probability. Adaptive prediction regions can adjust based on the context or the input data, providing more precise uncertainty estimates where needed. Valid prediction regions are those that maintain a specified coverage probability, ensuring that the true value lies within the region a certain percentage of the time.

Uncertainty in machine learning can be broadly categorized into two types: aleatoric and epistemic.

  1. Aleatoric Uncertainty: This type of uncertainty arises from inherent randomness in the data. It is also known as statistical or irreducible uncertainty. In the context of machine learning, aleatoric uncertainty can be due to noise in the data or measurement errors.
  2. Epistemic Uncertainty: This type of uncertainty arises from a lack of knowledge or information. It is also known as systematic or reducible uncertainty. Epistemic uncertainty can be reduced by gathering more data or improving the model.

Conformal Prediction is the solution

Conformal prediction is a powerful machine learning framework used to evaluate the uncertainty of predictions. It turns point predictions into prediction regions, providing prediction sets for classification tasks and prediction intervals for regression tasks. When you make a prediction, the output has probabilistic guarantees that it covers the true outcome.

Some advantages of Conformal Prediction:

  1. Guaranteed Coverage: CP ensures that the resulting prediction sets come with guarantees of covering the true outcome with a certain probability.
  2. Model-Agnostic: CP can be applied to any underlying model, making it versatile and easy to integrate with existing models without the need for retraining.
  3. Distribution-Free: CP does not require knowledge of prior probabilities or assumptions about the data distribution, making it broadly applicable. The only assumption is that the data points are exchangeable.
  4. Easy Implementation: The steps involved in CP are straightforward to understand, making it accessible to practitioners. The intuitive nature of CP allows it to be easily wrapped around existing models to enhance their reliability.

How does Conformal Prediction work in a nutshell?

  1. Choose the nonconformity score related to the task.
  2. Select the appropriate significance level alpha based on the desired probability coverage.
  3. Split the dataset into a training set and a calibration set.
  4. Train the Machine Learning model on the training set.
  5. Apply the trained model to the calibration set to obtain predictions.
  6. Compute the nonconformity scores for the calibration set.
  7. Sort these scores and determine the threshold q based on the chosen significance level.
  8. For new data points, make a prediction using the trained model.
  9. Build the prediction interval including all possible outcomes that produce a score below the threshold q from the calibration set.

What kind of predictors do exist for Conformal Prediction?

There are two main types of conformal predictors: transductive conformal predictors (TCP) and inductive conformal predictors (ICP).

Transductive Conformal Predictors (TCP): TCP leverages the entire dataset for training and requires model retraining for each new prediction. This approach ensures that each prediction is made with the most up-to-date information. TCP can provide highly accurate and reliable prediction intervals because it uses all available data for each prediction. The main drawback of TCP is its computational inefficiency. Retraining the model for each new prediction can be time-consuming and resource-intensive, making it impractical for real-time applications or large datasets.

Inductive Conformal Predictors (ICP): ICP splits the data into two sets: a training set and a calibration set. The model is trained once on the training set, and the calibration set is used to adjust the prediction intervals. ICP offers a significant computational speed-up compared to TCP because the model is trained only once. This makes ICP more suitable for real-time applications and large datasets. The trade-off for the computational efficiency is a potential loss in the accuracy of the prediction intervals, as the calibration set may not fully capture the variability in the data.

Are there any libraries available to compute Conformal Prediction?

The answer is yes!!! MAPIE (Model Agnostic Prediction Interval Estimator), is an open-source Python library helpful for quantifying uncertainties and controlling the risks of machine learning models. It allows you to easily compute conformal prediction intervals for regression, classification, and time series. Easily control risks of more complex tasks such as multi-label classification. Easily wrap any model with, if needed, a scikit-learn-compatible wrapper.

MAPIE — Model Agnostic Prediction Interval Estimator — MAPIE 0.8.6 documentation

MAPIE uses two types of predictor families: the Split Conformal Prediction and the Cross-Conformal Prediction.

Split conformal prediction, aka inductive conformal prediction, involves a two-step process: the training step when the model is trained on a portion of the data, and the calibration step when the nonconformity scores are computed using a separate calibration set, which is not seen by the model during training. This calibration set is used to determine how well new predictions conform to the established model. The key advantage of this method is that it maintains strong theoretical guarantees for marginal coverage, meaning that the prediction intervals will contain the true outcomes with a specified probability, typically defined by a significance level.

Cross-conformal prediction enhances the split method by employing a cross-validation approach. In this technique, the dataset is divided into multiple folds. For each fold, the model is trained on the remaining data while the current fold serves as the calibration set. The nonconformity scores are computed across all folds, allowing the model to leverage the entire dataset for calibration. This method combines the benefits of multiple training iterations, which can lead to more stable and reliable predictions. It effectively balances the use of data for training and calibration, potentially improving the efficiency of the prediction sets compared to the split method alone.

While split conformal prediction is straightforward and efficient, cross-conformal prediction provides a more robust framework by utilizing multiple training and calibration iterations, enhancing the reliability of the uncertainty estimates produced by the MAPIE library.

I‘ve used MAPIE to estimate conformal predictions of CO2 emissions from vehicles using a dataset provided by the Canadian government’s official open data website and retrieved from Kaggle.

For this job, I’ve compared the performances of the LightGBM model and the Quantile regression model. For each model, I’ve applied four methods: conformalized quantile regression, naive, jackknife, and jackknife-plus methods. My choice was alpha=0.1, then a target coverage of 90%.

You can follow the Notebook.

Conformalized quantile regression extends traditional quantile regression by incorporating conformal prediction principles. We first fit a quantile regression model to the training data. Then, we use the residuals to build a nonconformity score. Finally, we create prediction intervals based on these scores, ensuring that the intervals cover the guaranteed coverage value.

# fit MAPIE conformal quantile regressor using LightGBM estimator
np.random.seed(0)
LGBM_cqr = MapieQuantileRegressor(estimator=LGBM, cv="split", \
alpha=alpha, method= "quantile")
LGBM_cqr.fit(X_train, y_train, X_calib=X_cal, y_calib=y_cal, random_state=0)
# predictions and scores
LGBM_cqr_results, LGBM_cqr_predictions_df = \
calculate_predictions_and_scores(LGBM_cqr, X_test,"QRegressor", alpha)
# fit MAPIE conformal quantile regressor using QuantileRegressor estimator
np.random.seed(0)
QR_cqr = MapieQuantileRegressor(estimator=QR, cv="split", \
alpha=alpha, method= "quantile")
QR_cqr.fit(X_train, y_train, X_calib=X_cal, y_calib=y_cal, random_state=0)
# predictions and scores
QR_cqr_results, QR_cqr_predictions_df = \
calculate_predictions_and_scores(QR_cqr,X_test,"QRegressor", alpha)

The naive solution is the simplest method of conformal prediction. First, a predictive model is trained on a dataset, and for each observation in the training set, we calculate a nonconformity score based on the model’s predictions and the actual outcomes. The prediction interval is therefore given by the prediction obtained by the model trained on the entire training set adjusted by the quantiles of the conformity scores from the same training set.

# fit MAPIE naive regressor using LightGBM estimator
np.random.seed(0)
LGBM_naive = MapieRegressor(estimator=LGBM, method= "naive")
LGBM_naive.fit(X_train, y_train)
# predictions and scores
LGBM_naive_results, LGBM_naive_predictions_df = \
calculate_predictions_and_scores(LGBM_naive,X_test,"Regressor",alpha)
# fit MAPIE naive regressor using QuantileRegressor estimator
np.random.seed(0)
QR_naive = MapieRegressor(estimator=QR,method= "naive")
QR_naive.fit(X_train, y_train)
# predictions and scores
QR_naive_results, QR_naive_predictions_df = \
calculate_predictions_and_scores(QR_naive,X_test,"Regressor",alpha)

The Jacknife method is a resampling method and enhances the naive method by using a leave-one-out approach. For each observation in the training set, we train the model on all other observations by leaving one out. For the left-out point, we calculate its nonconformity score based on its prediction from the model trained on the remaining data. Similar to naive conformal prediction, we then build prediction sets for new observations based on these nonconformity scores, ensuring coverage of the true outcomes with high probability.

# fit MAPIE jackknife regressor using LigthGBM estimator
np.random.seed(0)
LGBM_jacknife = MapieRegressor(estimator=LGBM, method= "base", cv=5)
LGBM_jacknife.fit(X_train, y_train)
# predictions and scores
LGBM_jacknife_results, LGBM_jacknife_predictions_df = \
calculate_predictions_and_scores(LGBM_jacknife,X_test,"Regressor",alpha)
# fit MAPIE jackknife regressor using QuantileRegressor estimator
np.random.seed(0)
QR_jacknife = MapieRegressor(estimator=QR,method= "base", cv=5)
QR_jacknife.fit(X_train, y_train)
# predictions and scores
QR_jacknife_results, QR_jacknife_predictions_df = \
calculate_predictions_and_scores(QR_jacknife,X_test,"Regressor",alpha)

The Jackknife plus method follows the Jackknife method by incorporating additional information from the training data to refine the nonconformity scores further. As with jackknife conformal prediction uses a leave-one-out approach. Then, we not only calculate the nonconformity scores for the left-out observations but also consider the distribution of these scores across all observations. By leveraging the distribution of nonconformity scores, Jackknife plus can adjust the prediction sets to improve their accuracy and reliability.

# fit MAPIE jackknife+ regressor using LigthGBM estimator
np.random.seed(0)
LGBM_jacknife_plus = MapieRegressor(estimator=LGBM, method= "plus", cv=5)
LGBM_jacknife_plus.fit(X_train, y_train)
# predictions and scores
LGBM_jacknife_plus_results, LGBM_jacknife_plus_predictions_df = \
calculate_predictions_and_scores(LGBM_jacknife_plus,X_test,"Regressor",alpha)
# fit MAPIE jackknife+ regressor using QuantileRegressor estimator
np.random.seed(0)
QR_jacknife_plus = MapieRegressor(estimator=QR,method= "plus", cv=5)
QR_jacknife_plus.fit(X_train, y_train)
# predictions and scores
QR_jacknife_plus_results, QR_jacknife_plus_predictions_df = \
calculate_predictions_and_scores(QR_jacknife_plus,X_test,"Regressor",alpha)

You can follow the App for the results.

Here visualization results from the LightGBM

LightGBM model errors:

LightGBM model predictions:

LightGBM coverage:

LightGBM width:

Looking at the charts from the LighGBM model, the conformalized quantile regression can satisfy all requirements previously mentioned: efficient, adaptive, and valid.

QR model errors:

QR model predictions:

QR coverage:

QR width:

Looking at the charts from the Quantile Regression model, we can say that the LightGBM model is the better choice.

Conclusions:

MAPIE is the right tool for conformal prediction:

-MAPIE is designed to provide prediction intervals that can be applied to any predictive model without requiring modifications to the model itself.

-MAPIE ensures that the prediction intervals meet the desired coverage probability empirically by using a portion of the data to calibrate these intervals, providing a reliable measure of uncertainty for predictions.

-MAPIE it’s straightforward to implement.

References:

-https://mapie.readthedocs.io/en/latest/index.html

-CO2 Emission by Vehicles (kaggle.com)

-OpenDataCanada

-https://co2emissions.streamlit.app

-Notebook

-https://github.com/PacktPublishing/Practical-Guide-to-Applied-Conformal-Prediction

-https://christophmolnar.com/books/conformal-prediction/

--

--