DATA SCIENCE THEORY | EXPLAINABLE AI | MACHINE LEARNING

Conformal prediction for regression

A hands-on codeless example with KNIME

Artem Ryasik
Low Code for Data Science

--

A fancy image to draw attention produced by Midjourney that tried to visualize the conformal prediction.

After writing a blog post on how to use conformal prediction for classification in KNIME, I could not but continue and describe the regression case. This use case is a bit more complicated but I believe I did my best to describe it well. As in the previous post, there will be two approaches described — “advanced” and “simple”. In this article, we are going to tackle a regression problem to predict the price of used cars, considering two main parameters: error rate and normalization sensitivity parameter beta (both are described below). As usual everything will be wrapped between the nodes of the KNIME Integrated Deployment extension so the workflows are ready for production immediately.

Read Conformal prediction for classification.

Read Conformal prediction theory explained.

The data

Today we are going to work on predicting the price for used cars (Kaggle) using attributes such as producer, miles per gallon rate, year of production, mileage, type of transmission and so on. One can consider this as an application for people who would like to sell their car and estimate how much money they might expect to earn, or to help companies that operate in the used cars market adjust their prices.

Let’s first take a look at the data: the producer distribution, and the car model per producer distribution (see figure 1).

Figure 1. The distribution of producers (top) and models per producer (bottom).

Here, we can see that in general we have plenty of representatives for most of the producers, except for Toyota and Škoda, whose low counts are going to be a good test for conformal prediction.

The workflow

As in the previous post, I am going to focus on describing the “advanced” case that includes training multiple models and getting calibration tables for them, optimizing error rate, and normalization sensitivity parameter beta.

Those who are impatient can directly jump to the KNIME Community Hub and download the Conformal prediction regression “advanced” (link to workflow) and Conformal prediction regression “simple” (link to workflow) workflows. The implementation of both cases will be described below.

Data processing

First of all, it is always good to check if there are any correlated features. Computing correlation is one of the easiest methods to find redundant features but it’s not the only one. As it is expected, there is a correlation between year and mileage (-0.74): the older the car, the more miles it went over (see figure 2). This way, we can eliminate the feature “year”, since “mileage” seems to be more granular and accurate.

Figure 2. Correlation matrix of features.

Another thing that is in general good to do is to normalize the target variable and the features. This applies also to our use case where we have features of different scales, and that might be relevant for some predictive algorithms.

The data set is actually quite big and excessive, so readers are free to control the size of the data sets for training, calibration, and testing using the Row Sampling node. For this example, I decided to retain only 20% of the original data set, applying stratified sampling over the target variable — “price”. Finally the data set is split into 2 parts: training + calibration sets, and test set.

The final part here is the component called “Conformal prediction configuration” where users can define the main parameters for conformal prediction regression: the desired error rate; whether there will be a normalization for conformal predictions; and if so, users can provide a normalization sensitivity parameter beta that has range (0; 1]. If normalization is performed, it needs to be applied to both the calibration and prediction steps. Hence, it is better to define the parameters in the very beginning as it will be more convenient to propagate these settings as flow variables. The part of the workflow dedicated to the processing steps is shown in figure 3.

Note. Since there are plenty of combinations for the training process, I encourage the readers to spend some time playing around with multiple combinations of the initial settings: normalize vs. not normalize the data, and normalize vs. not normalize the conformal predictions. The experiments for optimizing beta and the error rate will be described later in this article.

Figure 3. Initial processing and configuration part of the workflow.

Training and calibration

Here, we repeat the same procedure that we applied for conformal prediction classification (see figure 4): we split the data set into training and calibration sets. The training set is used for training regression models, and the calibration set is used for creating calibration tables. In today’s example, I used the Random Forest algorithm for regression to fit the “price” column using all features, except “year”. The loop block does the following: the training data set is used for training the model. Then, the trained model is used to predict values on the calibration data set. Finally, if normalization is used, we introduce a measure of prediction difficulty (Sigma) — note that normalization is an optional step.

The most straightforward candidate for Sigma is the absolute error, since it is very easy to calculate — in this example, we are going to use it too. Sigma can be easily calculated with the Math Formula node by taking the absolute values from the difference between predicted price and real price.

Figure 4. Overview of the training and calibration step.

Once predictions and Sigma values are obtained, they can be fed into the Conformal Calibrator (Regression) node (see figure 5). After that, the pairs of models and calibration tables are collected and synchronized together with the Conformal Calibration Loop End node.

Figure 5. The dialog of the Conformal Calibrator (Regression) node. Here the user should provide the columns for the target variable and the predicted values, so the node can estimate the conformity of predicted values to the real ones and create a calibration table. There is an optional feature to use normalization, in that case the user should also provide a prediction difficulty column and the normalization sensitivity parameter beta.

Conformal prediction

This step starts with the Capture Workflow Start node, since this is the part that we are going to use for deployment. It has 4 inputs: 3 tables and 1 flow variable:

  • Input for the models obtained in the previous step;
  • Input for the calibration tables obtained in the previous step;
  • The new data for prediction — unlabeled or test data set;
  • The settings that were used during training. It is important to use the same normalization settings that were used during training, since the Conformal Predictor and Classifier (Regression) node expects the same parameters that were used in the Conformal Calibrator (Regression) node. Providing mixed settings will lead to invalid predictions, please pay attention to this!

In this block, the models and calibration tables are synchronized using the Conformal Prediction Loop Start node, so the nodes iterates over each pair: the model is used to predict values in the test set, and the calibration table is used for obtaining conformal prediction in the Conformal Predictor and Classifier (Regression) node. Please pay attention that, similar to the previous step, there is a Math Formula node to calculate Sigma (absolute error), which might be used for predictions depending on the settings provided via flow variables. In this workflow, the settings for the Conformal Predictor and Classifier (Regression) node are overwritten by flow variables for the user’s ease.

Finally, all the predictions are aggregated using the median function in the Conformal Prediction Loop End node. After that, the Capture Workflow End node is used to wrap the part of the workflow that we would like to deploy. Lastly, the workflow can be deployed with the Workflow Writer node or propagated to the Workflow Executor node.

Predictions quality estimation

Once the second loop block is over, it is possible to compare predictions with actual values with the Conformal Scorer (Regression) node. In this case, the main metrics are:

  • Error rate — the experimental (real) one that we get in the predictions;
  • The mean interval size that describes the band within which the predictions are located.

In general, the experimental error rate corresponds to the expected one that we set up in the Conformal Predictor and Classifier (Regression) node. The interval size expectedly increases with a decrease of the error rate — so here comes a trade-off between the interval size and error rate. This may lead to two polar situations: a) when there might be ridiculously large intervals between the lower and upper bound, which will certainly contain the real and predicted values; b) the high percentage of errors that the user is ready to tolerate in exchange for a smaller and more reasonable interval size. This will be discussed further in the error rate optimization section.

Optimizing normalization sensitivity parameter beta

One thing that might potentially help with the trade-off issue described above is the normalization of the conformal prediction. This technique allows us to reduce the interval size for the easy examples, and keep it larger for the difficult ones. As it was previously stated, to calculate normalized values, a measure of difficulty should be introduced. The most straightforward and easiest measure that can be obtained is the absolute error. In that case, beta is the sensitivity parameter of the normalization determining the influence of the normalization (you can learn more about it in these papers: PDF, PDF).

In order to find the best values for beta, there is a separate branch in the workflow called “Beta optimization”. Please pay attention that this branch calls the workflow that has been described in the previous sections. Therefore, in order to run this branch you can either run the main part, or you can deploy and then read it with the Workflow Reader node. The goal of the optimization branch is to try out a range of values for beta, conduct training and calculate calibration tables according to this value, estimate the prediction with a scorer node, and manually analyze the data with a dashboard. In this example, I am going to use the interval [0.25; 1] with a step of 0.25 for beta values.

Figure 6. A branch of workflow for beta optimization.

Once the optimization loop is over and the prediction for all beta values are obtained, they should be properly stacked for us to be able to plot and compare them. To do this, the Group Loop Start node is used to iterate over the groups of producer and beta values. In the loop body, dynamic column names are created and appended using the Loop End (Column Append) node (this part can be seen in figure 6). Finally, in the component called “Select producer”, the user can select which producer prices, predictions and bounds to plot. For visualization, all values are de-normalized so it will be possible to see meaningful price ranges.

Figure 7. The examples of the conformal prediction for different beta values for Ford Fiesta.

In figure 7, one can see the example of Ford Fiesta cars prices, predictions and 4 pairs of upper and lower prediction bounds for all the samples. In general, this plot is not useful for selecting the beta coefficient, but rather to investigate the prediction and its ranges — it is quite handy to click on the legend to include or exclude some lines. The good thing is that we can refer to the output table returned by the Conformal Scorer (Regression) node (figure 8).

Figure 8. The results of beta optimization with fixed error rate = 0.1.

Here again one needs to define what is the best criterion for selecting the beta value. Let’s say you prefer to have the smallest minimal interval, then it is better to define beta = 0.25, same for the median and mean values. On the other hand, if one would like to have the smallest maximum interval, then it is better to take beta = 1. In the next section, I am going to use beta = 0.25, but again I encourage the readers to do their own investigations.

Optimizing error rate

Another parameter that is interesting to optimize is, of course, the error rate. In this post, we are going to do pretty much the same as in the classification case. The only difference is that now it is possible to proceed with or without normalization, and define beta as another parameter. Since beta has been optimized in the previous section, in this experiment I am going to use normalization with beta = 0.25.

The procedure to optimize the error rate is pretty much the same as the procedure to optimize beta. The range of values is [0; 0.25] with a step of 0.05. The deployed workflow is executed using the error rate values from the interval. At the output of the loop end node, we get estimations from the scorer and conformal predictions. In the same way that was implemented in the previous section, the predictions are aligned by producer and error rate value. The producer can be selected in the “Select producer” component, and values plotted with a Line Plot (Plotly) node. Let’s have a look at the results in figure 9.

Let’s take a look at Toyota as it was one of the producers with the smallest amount of records. There, one can see that predictions are grouped by models, so there are different levels of prices. By clicking on the values in the legend, it is possible to remove or add different lines, so it is more convenient to compare the prediction bounds. It turns out that the pattern we found is quite similar to what we had in the classification case:

  • Decreasing the tolerable error rate leads to increasing the upper and lower bounds, but at the same time the user can ensure that the true values are more likely to be within this band;
  • Vice versa — loosening the error rate leads to smaller prediction bounds, causing more predictions to go outside of these bounds.

One can also notice a fairly ridiculous case whenever the error rate = 0. This causes the lower bound for the majority of samples to become negative, which in terms of price does not make any sense at all. One more note is that including normalization affects the bounds: for easier cases the bound becomes more narrow, whereas for the harder ones it becomes wider. It is probably hard to see that on the plot, but it is possible to calculate the bound length with the Math Formula node.

Figure 9. The plot of real prices, predicted prices and multiple prediction bounds for Toyota.

A better alternative to conducting a manual analysis of predictions at the producer or even car model level is to take a look at the aggregated Conformal Scorer (Regression) outputs (see figure 10) that are plotted with an “Error rate optimization analysis” component. In the top plot, one can see that the theoretical error rate (that we set up in the node settings) corresponds to the real error rate (the one we estimate with the scorer) quite well.

Another useful insight here comes from the second plot, where one can see how different aggregated intervals change depending on the error rate values that we use in the optimization loop. Hence, this plot might be useful to define the desired error rate depending on the user’s requirement for prediction — “what should be the maximum upper or lower bounds?” or maybe “is it better to find the optimal mean interval range?”. Unfortunately, conformal prediction does not provide an answer to all these questions. Its power is in providing a scale of the certainty or uncertainty of the predictions for a particular data set. The user takes the final decision as usual.

Figure 10. Visualization by the “Error rate optimization analysis” component showing the comparison between experimental vs theoretical error rate (top) and error rate vs aggregated intervals (bottom).

“Simple” approach

I also prepared a “simple” implementation of conformal prediction for regression. As in the previous post, simplicity entails avoiding loops for training multiple models and obtaining multiple calibration tables. Additionally, I did not include any parameter optimization, but the workflow deploys the part with the post-processing of results, so it is as SIMPLE as possible. This approach is good for a quick and dirty implementation, for a first try or for study purposes. The potential drawback is that the experimental error rate might be a bit higher than the theoretical one due to multiple reasons related to the data set itself, the complexity of the series of the target variable, the model and its parameter selection, and other many reasons that we encounter in data science. Nevertheless, I encourage the readers to give it a try and compare the two approaches.

Conclusion

In this blog post, I described how to use conformal prediction for a regression case using KNIME, how to prepare the workflow for deployment with help of the KNIME nodes for Integrated Deployment, and how to use the deployed workflow to optimize two main parameters: the normalization sensitivity beta and error rate. We went through a use case employing a fairly simple data set that is quite easy to interpret, and we extracted some insights from it.

The purpose of this blog post, as it was for classification, is a gentle introduction (free of math and formulas) to conformal prediction for regression using a practical example that can be easily tailored to a new data set.

I really hope that the readers will find this series of blog posts useful. I also hope that these examples will encourage you to use conformal prediction in your projects. I understand that there might be more exciting and complicated use cases for both classification and regression, such as anomaly detection or time-series analysis, and maybe I will describe them later on in a new blog post. Stay tuned!

References

  1. Advanced workflow
  2. Simple workflow
  3. First post explaining the theory
  4. Previous post for classification case

This blog post is written as my private initiative and not anyhow related to my current employer.

--

--

Artem Ryasik
Low Code for Data Science

PhD in Biophysics, data scientist. Interested in graphs, NLP, data anonymization. Knime enthusiast. www.linkedin.com/in/artem-ryasik