DATA SCIENCE THEORY | EXPLAINABLE AI | KNIME ANALYTICS PLATFORM

Conformal predictive systems

A hands-on codeless example with KNIME

Artem Ryasik
Low Code for Data Science

--

MidJourney fantasy on the conformal predictive systems.

Today we are going to go through an extension of conformal regression called conformal predictive systems. The main advantage here is that this method produces a conformal predictive distribution — the cumulative distribution function (CDF) that is used for calculating the predictive intervals individually for every sample. Moreover, one can control the size of these intervals as simple parameters. Of course the following extension is implemented as native nodes for KNIME.

In this blog post, I am going to replicate most of the things I have done in the previous example for conformal regression with some differences because other nodes and concepts will be used.

The purpose of this post is to show how conformal predictive systems work on a real use case example in KNIME. If you’re interested in a deeper dive in the theory behind conformal predictive systems, I recommend looking at the reference section for extra materials.

The workflows are available for free on KNIME Community Hub — advanced and simple.

Check my other data stories about conformal prediction theory and conformal prediction for classification.

The use case

For this blog post, I am going to use the data set from the previous conformal regression example — the used cars data set (Kaggle). The reason is that one will be able to compare the output and see the benefits of the Conformal Prediction Systems extension. Moreover the workflow is also going to have a very similar structure using the Integrated Deployment extension, where the workflows are automatically deployed for production. Furthermore the workflow includes sections for parameter optimization of beta (normalization weight) and significance level to compare the prediction intervals.

The workflow

Again here we have the same structure: an “advanced” and a “simple” workflow. The key difference is that for the “simple” case only one model is trained. Here, I am going to focus on the “advanced” case, since it can be easily adapted to become “simple”.

Data processing

This part is the same as in the previous example for regression:

  • Choosing between the highly correlated features of “year” and “mileage”, we are going to select “mileage”.
  • The numerical data is going to be normalized with the Min-Max method.
  • The data set is large, so its size is reduced with sampling by the “producer” attribute.

The difference starts with the “Conformal prediction configuration” component. Instead of the confidence level, the user is supposed to enter the upper and lower percentiles for the CDF (they can be derived from the confidence level). The remaining settings for normalization are the same as in the previous example (see Fig. 1).

Figure 1. The dialog of the “Conformal prediction configuration” component allows the user to decide whether or not to use normalization, provide the normalization weight (beta), and enter the upper and lower percentiles for CDF.

Training and calibration

The data set is split multiple times to produce three different parts:

  • A training data set — used for training the model.
  • A calibration data set — used for calculating calibration tables that will be used for CDF computation.
  • A test data set — used for evaluating both model predictions and conformal predictions.

The procedure for calculating the calibration tables is similar to the regression case, but here we use the Predictive Systems Calibrator (Regression) node to obtain calibration tables. Loop nodes are compatible with all types of conformal prediction nodes (see Fig. 2).

Figure 2. The procedure for calculating multiple calibration tables.

Conformal predictive systems

Once the several models and calibration models are obtained we can start preparing the workflow for production where the inputs are going to be:

  • The predictive models (of any kind supported by KNIME).
  • Calibration tables synchronized with these models.
  • Test or new unlabeled data.
  • Conformal predictive system parameters — in our case it is the normalization weight beta and upper/lower percentiles interval for the CDF — which is similar to confidence level — (these parameters can be set with the “Conformal prediction configuration” component).

That’s why we wrap up this part of the workflow with the Capture Workflow nodes (see Fig. 3).

Figure 3. The prediction part of the workflow that is going to be deployed and also used for parameters tuning.

Here the models are applied to the test data set and calibration tables are used to calculate the CDF for every sample with the Predictive Systems Predictor (Regression) node. Finally, the aggregated CDFs are calculated with the Conformal Prediction Loop End node. The final step here is to calculate the individual prediction bounds for every sample with the Predictive Systems Classifier (Regression) node (see fig. 4).

Figure 4. The dialog of the Predictive Systems Classifier (Regression) node where the user can select multiple upper and lower prediction bounds as percentiles of CDF.

Basically this is it, everything we need for deployment is in place. Now we need to see how to better optimize the parameters of conformal prediction and see what is the difference between their different values.

There are two more options in the dialog of Predictive Systems Classifier (Regression) node that we are not going to use in the current example:

  • Target value — a fixed value to compare prediction with. As the output, there will be a calculated probability that the predicted values are lower than this fixed value.
  • Target column — the column that contains values to compare with (test set). As the output, there will be a calculated probability that the predicted values are lower than corresponding values from the selected column.

Optimizing beta

First, let’s optimize beta — the weight that controls the normalization and determines the relative importance of difficulty of predictions called Sigma. The benefits of normalization were discussed in the previous blog post, but the main idea here is to increase the informativeness, and to potentially minimize prediction regions. Indeed, it is possible to obtain individual bounds for each sample, which is achieved using a normalized nonconformity function.

Figure 5. The beta optimization part. Here different beta values are used for calculating calibration tables, then parameters, models and calibration tables are inserted to the deployed workflow in order to get the predictions.

As Fig. 6 shows we are going to take a range of [0.05; 0.25] with step 0.05 for beta. As long as beta must be set up at the calibration step, we need to train models and obtain calibration tables at every step. Once it is done we can make the inference for the test data set and gather the results for each beta value in order to compare them. In the “Select producer” component in this branch of the workflow, the user can select the car producer to see its prices and predictions. Here the user will get another plot that shows the statistics of prediction intervals. Despite having the smallest interval for beta=0.05, we also have the highest possible interval. Hence, in order to get rid of the extreme values, I suggest taking beta=0.25 to proceed further. However, I encourage you to do your own experiments and test different beta ranges.

Figure 6. The plot of the statistics of prediction intervals as a function of beta.

Error rate optimization

The next section is dedicated to analyzing the difference between upper and lower bounds of CDF that define the prediction interval. It is not really possible to optimize these parameters, since everything depends on the tradeoff between prediction error tolerance and the desired width of the prediction interval. So the purpose of this section is to show how to interpret different prediction bounds and demonstrate how the Predictive Systems Classifier (Regression) node works (see Fig. 7). Similar to the previous section we are going to use the trained workflow, we take beta=0.25, and then we optimize the error rate values: 2%, 10% and 50%. These values will be automatically converted to the following bounds, respectively:

  • 1 and 99 percentiles.
  • 5 and 95 percentiles.
  • 25 and 75 percentiles.

The error rate notation here is used to follow similar logic as for the regression case in the previous blog post, and for the simplicity to use one parameter instead of a pair.

Figure 7. The overview of the error optimization branch.

These intervals were picked to have two extreme cases — too strict and too loose, and one appropriate real case value. First, let’s check that expected and real results agree, and how the prediction intervals depend on the error rate. This is what the Error rate optimization analysis component calculates. As it can be seen on Fig. 8, the concordance between expected and real error rates is quite good. Furthermore, as expected, the prediction interval statistics decrease with increasing values of the tolerable error rate, where the decrease of values of “Max interval size” is the most remarkable.

Figure 8. Comparison of expected vs real error rate values (top), the statistics of the prediction interval size as a function of error rate.

Finally, the user can select the prediction for a certain car producer separately, considering different confidence levels that are converted to upper and lower intervals. I would recommend comparing the bound in pairs (e.g. 2% vs 50%, 10% vs 50%, etc.). In Fig. 9, we can see that 50% gives a very narrow interval almost matching the prediction curve. Apparently, in this case we can have up to 50% of the errors since we allow 50% of the real values to be outside of this interval. For a strict case of 2%, one can see that the interval becomes extremely wide, and even for some values the lower bound can even become negative, which definitely does not make any sense in terms of car price prediction. At the same time, the user can be pretty much confident that no more that 2% of real values can be outside of the prediction bound.

Figure 9. The comparison of the prediction intervals for 2%, 50%, prediction and sorted real price in ascending order for Volkswagen (top); the same plot but for predicted values vs true values (bottom).

Just to see the contrast and be more realistic let’s consider 10% error rate (see Fig. 10), and it looks quite okay: the interval bounds are not extremely big or small, the bounds do not have nonsense values.

Figure 10. The comparison of the prediction intervals for 10%, prediction and sorted real price in ascending order for Volkswagen.

And a final note, as long as we have applied normalization, every prediction gets individual bounds indicating the difficulty of prediction for this particular sample. That actually is useful for prediction explanation. However, this interesting topic is not part of today’s experiment.

“Simple” workflow

The Conformal Prediction extension has pairs of nodes for “advanced” and “simple” cases, so I have prepared the “simple” workflow as well. The main difference here is that it uses the nodes that can immediately do the predictions without training multiple models, calculating calibration tables and dealing with their aggregation. Another simplification is that there are no branches for beta and error rate optimization. However, this workflow also utilizes the Integrated Deployment extension, so the output of the workflow will automatically deploy the core part in order to use in production. This workflow can be also found on KnimeHub.

CDF and interval explanation

Another note about the CDF. Both workflows have a component called “Explanation example”, that randomly picks a sample and plots its CDF (see Fig. 11). It also shows the location of the true value and the prediction of this sample. There are three (hard-coded) percentile intervals that correspond to:

  • 2% error rate — percentiles 1 and 99.
  • 10% error rate — percentiles 5 and 95.
  • 20% error rate — percentiles 10 and 90.

The Y axis values show the probability of real value being below a corresponding value on the X axis. The X-axis values show the range of the target values.

Figure 11. A conformal predictive distribution with different intervals. The real and predicted values are shown with solid lines, the different percentiles for lower and upper bounds are shown with dashed lines.

Conclusion

This is the last blog post in the series describing the currently implemented nodes for conformal prediction in KNIME. Here it was described how to use (inductive!) conformal predictive systems that generalize the case for conformal regression. Here the user has more power to not only control the range of the prediction, but also set up comfortable probability boundaries for the CDF functions individually for each and every sample.

Having CDF also brings additional value, and for example can be helpful for understanding the uncertainty of individual predictions and use different statistical tests, such as Kolmogorov-Smirnov test to estimate the difference between observations. However, this is out of scope for this blog post, and would be happy to hear about your experience with conformal predictive systems.

References

  1. Lofstrom, T., Bondaletov, A., Ryasik, A., Bostrom, H. and Johansson, U., 2023, August. Tutorial on using Conformal Predictive Systems in KNIME. In Conformal and Probabilistic Prediction with Applications (pp. 602–620). PMLR. — PDF
  2. Paolo Toccaceli: Conformal Predictive Distributions — YouTube

This blog post is written as my private initiative and not anyhow related to my current employer.

--

--

Artem Ryasik
Low Code for Data Science

PhD in Biophysics, data scientist. Interested in graphs, NLP, data anonymization. Knime enthusiast. www.linkedin.com/in/artem-ryasik