**A Conformal Predictions Workshop**

## Distribution-Free Uncertainty Quantification #ICML2022

**Distribution-Free Uncertainty Quantification Workshop**

Presented at the International Conference on Machine Learning held in Baltimore, MD on July 23, 2022.

## Workshop Home Page

Organizers: Anastasios Angelopoulos (UC Berkely) · Stephen Bates (UC Berkely) · Sharon Yixuan Li (UW Madison) · Ryan Tibshirani (CMU) · Aaditya Ramdas (CMU)

The following notes represent an unofficial meeting minutes of sorts, recording the discussions presented at the ICML (International Conference on Machine Learning) workshop on **Distribution-Free Uncertainty Quantification**. Any presenters who are unsatisfied with their representation please feel free to contact the author.

## Agenda:

- Opening Remarks
- Michael I. Jordan: Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control (Live Talk)
- Poster Session
- Zhimei Ren: Sensitivity Analysis of Individual Treatment Effects: A Robust Conformal Inference Approach (Live Talk)
- Yao Xie: Conformal prediction intervals and sets for time-series (Live Talk)
- Panel Discussion
- Spotlight Presentations (Recorded Spotlight Talks)
- Insup Lee: PAC Prediction Sets: Theory and Applications (Live Talk)
- Rina Barber: Conformal prediction beyond exchangeability (Live Talk)

## Opening Remarks

Conformal Predictions refers to methods for quantifying the uncertainty surrounding an inference operation that may be performed independent of assumptions on model configurations or in some cases even data distributions. The methods provide a form of statistical guarantee surrounding those predictions that otherwise in mainstream machine learning practice are commonly received as point-wise estimates without the context of surrounding uncertainty bands. The methods may be applicable to various forms of inference like classification, regression, segmentation, generation, or so on; and each as could be derived by various learning paradigms like deep learning, gradient boosting, contrastive learning, or etc.

The methods explored in this workshop may be of particular interest to those basing high stakes decisions on model predictions, whether those decisions may have consequences in healthcare, finance, risk assessment, or other important applications of machine learning. The benefits of conformal learning in comparison to model based uncertainty quantification include low latency implementation with the primary tradeoff of necessitating a partitioned calibration data set in training or otherwise additional assumptions surrounding data properties.

This workshop assumed some degree of familiarity with the fundamentals of conformal predictions from participants, as most speakers and poster presenters were addressing the frontiers of research with extensions into new domains or by way of new algorithms. Please note that the author of these meeting minutes was not a domain expert at the time of attendance, and thus much of the recorded content herein will instead focus on aspects of fundamentals, with those more advanced aspects of research often abbreviated or omitted. Thus these meeting minutes may be of most benefit to those looking for a high level introduction to state of the art. My apologies to the speakers if I was unable to capture those most important contributions from your material, and offer to the reader that if you would like to learn more about the workshop their website currently includes a dedicated page of tutorial papers and videos meant to serve as a formal and more thorough introduction to the practices of conformal prediction. I expect that ICML will also be releasing a video recording of the workshop which will obviously be a suitable resource for those interested in a deeper dive of this content. (*postscript: the writeup A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification by Angelopoulos and Bates has since become my favorite introductory resource.)

Please note that the workshop included two poster sessions. The author only recorded notes on a small selection of accepted posters, primarily in the first session, and encourages those interested in a more comprehensive survey to take a look at the full range shared at the workshop website. (There is probably a small selection bias in play for those posters surveyed in this writeup associated with adjacency to this author’s interests.)

Yeah so fine print complete, presented here are the unofficial meeting minutes. Enjoy.

## Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control

- Michael I. Jordan (Live Talk)

The mainstream practice of machine learning, in a general way, may be thought of as a kind of pattern recognition. A training operation extracts and encodes the patterns found in a training corpus into the weights of a network, and then inference translates a previously unseen sample to a form of predicted label consistent with such patterns. Most machine learning can be approximated as a kind of complex Bayesian model, although one where the underlying assumptions of inference are often difficult to justify or in many cases even to articulate. The implications of these practices towards high stakes decisions has as yet been somewhat neglected by the research community, but we argue is just as important. Consider those ML applications that may have outsized consequences in cases of mis-calibration like market mechanisms, patient diagnostics, or other high stakes decisions. Uncertainty calibrations may illustrate when it is appropriate to consider weightings towards counterfactuals. They may be applicable to static assessments, streaming applications, or asynchronous decisions. They may especially benefit scenarios when faced with scarcity constraints or decision making under competition. All of these scenarios should benefit from a well-calibrated uncertainty quantification.

The methods of conformal prediction are able to be applied independent to the learning paradigm without any assumptions on model characteristics, basically by treating the predictive model as a form of black box operation with an appended calibration layer to augment inference with statistical rigor of uncertainty quantification, resulting in a kind of modularization of the information flow. Consider classification applications with human subjects, we will often care about prevalence of false positives and other aspects of risk towards humans in the loop. By expanding the point-wise fidelity of inference, we gain awareness of the context surrounding inspected instances.

Practitioners of conformal prediction refer to such uncertainty quantification as “distribution-free statistics”, which framing differs from the use of classical statistics applied to evaluate models by replacing assumptions on the predictive model (like parametric distributions or complexity class of the predictive function) with different kinds of assumptions like those surrounding what we call “exchangeability”, which refers to exchangeability of data passed to inference with that from training. In prior classical statistics framings researchers may have attempted to modify a model’s architecture or training algorithm in a manner to get additional statistical validity. Distribution free statistics may be applied even to prior trained models without a need for retraining. Classical statistics may rely on certainty gained by approaching asymptotics like infinite sampling or what can be inferred by the central limit theorem, while distribution free statistics offers finite-sample guarantees, and with the exchangeability assumption may be applied even in cases of data concentration inequalities.

To pick an arbitrary example, consider a computer vision application inspecting a medical imaging output for the purposes of detecting sets of pixels that could be classified as an active tumor. With traditional machine learning conventions it may be difficult to make informed decisions for treatment options, which could each carry their own downstream risks. In these prior conventions, we could have tried to achieve calibration by variations on the loss function during training (like the difference between AUC ROC, F1 score, and etc). Uncertainty quantification realized from conformal predictions instead allow us to translate inference output in a manner that arbitrarily controls the balances of false negative rates, again in a manner independent of the original loss function applied. Such risk controlling predictions allow us to enforce an inequality on inference that keeps a resulting risk probability below some desired threshold, for example setting the probability that a risk resulting from inference is greater than threshold A is less than threshold B, or to put more concretely: `P(risk>A)<B`

.

Diving a little deeper into the weeds, the ability to estimate such adherence to threshold inequalities is realized by a kind of hypothesis testing procedure applied to a subset of sampling ranges parameterized by what we’ll refer to as a lambda, where the null hypothesis is associated with risk not being controlled for a given lambda. As a kind of hand wavy explanation of the two step procedure, in step one we derive a p value for each null hypothesis (p value is a common classical statistical measure), and then in step two use those estimates to select a subset of lambdas that bound the error. You may be familiar with p value from traditional statistics, we have options here to estimate it (forgive the undefined keywords here) by a naive Hoeffding method or as the speaker noted perhaps even better by Martingale methods. The step two operation similarly can perform multiple testing to derive marginal p values by a naive Bonferroni method or as the speaker noted perhaps even better by sequential graphical testing. (Yeah the blogger doesn’t know what those terms mean either, might be worth a quick search engine break if you have the time.) I believe the speaker inferred that it was the sequential graphical testing operation that may rely on a partitioned calibration set, noting that a classification operation may also benefit by testing for out of distribution inputs adjacent to inspecting such calibration data.

The speaker devoted a second part of his talk towards applications of conformal prediction in context of design problems (e.g. for protein functional fitness tests). The blogger’s notes were a little sparser on this content and thus defer a reader to the recorded video for an authoritative treatment. As an aside I would like to offer a hat tip to the speaker (Michael Jordan) for having a role to play in so many diverse high value applications at the frontiers of research, for example the last time I saw one of his presentations he was offering an introduction to the Ray library for parallelizing python workflows.

## Poster Session

Just to restate what was noted above, this author’s survey of the poster session was far from comprehensive. There was a diverse selection of research, all of which can be inspected at the recorded spotlight presentations and papers linked on the workshop website. We offer here a few recorded notes intended as a form of highlights.

**Confident Sinkhorn Allocation for Pseudo-Labeling ***by Vu Nguyen, Sachin Farad, Anton van den Hengel — *noted that semi-supervised learning is commonly applied in applications like e.g. spatial or image modalities. This work sought to make pseudo labeling achievable for tabular data, potentially opening the door to semi-supervised learning of partially labeled tabular data sets.

**MAPIE: an open-source library for distribution-free uncertainty quantification ***by Vianney Taquet, Vincent Blot, Thomas Morzadec, Louis Lacombe and Nicolas Brunel — *offered a packaged python library implementation for distribution free uncertainty quantification, potentially of use for applications like classification, regression, time series data, and etc.

**An empirical Bayes approach to class-conditional conformal inference** *by Tiffany Ding, Anastasios Angelopoulos and Stephen Bates — *considered conditional conformal prediction after integrating empirical Bayes, which may benefit the scenario of labels with large amount of classification classes.

**Confident Adaptive Language Modeling*** by Tal Schuster and Adam Fisch — *noted that large language models may have even up to hundreds of layers in their network, but some types of generated tokens may not need to utilize all of those layers in inference, which collectively become computationally expensive. They suggest using a confidence estimate to decide layer exit points in inference to reduce the average per token computational cost of text generation by large language models.

**JAW: Predictive Inference under Covariate Shift** *by Andrew Prinster, Anqi Liu and Suchi Saria — *considered predictive inference in cases of covariate shift. They extended the “jackknife” method for deriving calibration holdout splits, with the result of helping to tighten the confidence intervals around an inference output in cases of covariate shift. (Traditionally conformal prediction methods make use of one of either split conformal, full conformal, or jackknife methods for deriving the calibration holdouts form the training data set.) Tightening such confidence bands could help e.g. select a medication dosage under a tight allowance for safety range.

**Inference for Interpretable Machine Learning: Fast, Model-Agnostic Confidence Intervals for Feature Importance** *by Luqin Gan, Lili Zheng and Genevera I. Allen — *my understanding was that this work represented a new way to derive feature importance, with the added benefit of including confidence intervals around importance estimates which are not provided in existing methods.

**Sample-dependent Temperature Scaling for Improved Calibration** *by Tom Joy, Francesco Pinto, Ser-Nam Lim, Philip Torr and Puneet Dokania* — offered a means to apply a pre-trained VAE for purposes of sample dependent temperature scaling to improve calibration. (My notes weren’t very good on this one partly because the poster literally kept peeling off the wall while we were talking, lesson learned heavier paper stock is not always better paper stock :).

## Sensitivity Analysis of Individual Treatment Effects: A Robust Conformal Inference Approach

- Zhimei Ren (Live Talk)

Examples of some common channels of inquiry in medical research that may draw from traditional machine learning or statistics include questions like: “Is a new drug effective for a condition?” and “Will a vaccine be effective against a particular variant?” We can often translate such inquiries to a causal framing such as: given n subjects, with observed confounders and covariances, if we present a given treatment assignment what extraction of observables may result? Such causal framings typically rely on a strong observability assumption. This work sought to relax those strong observability assumptions in the causal framing by building on the tools of conformal predictions.

An example of applying conformal predictions to the application could include evaluating both an average treatment effect and a conditional average for whether an individual would benefit from a treatment. The conformal prediction approach would consider both the individual effect as well as a theoretical sequence of confounding levels. This allows modeling a confounding strength, noting that we still should consider the distribution shift between training and target distributions. The resulting robust prediction method allows us to construct a prediction interval.

Note that numerical evaluation of conformal accuracy by this speaker as well as elsewhere in this workshop often drew from a chart of actual coverage to theoretical coverage, where proximity of observed points with respect to the ideal diagonal (y=x) could demonstrate effectiveness of the methods. (Points above the line are preferred to below I believe).

**Conformal prediction intervals and sets for time-series**

- Yao Xie (Live Talk)

This talk considered the case of sequential conformal prediction applied to time series data, attempting to extend point-wise estimates of prediction intervals to derive curves of intervals for an inferred path along a time axis. Examples of such applicability would include projections of energy production for intermittent renewable energy resources, or potentially patient statistics in an intensive care unit. The applicability of the methods could be adapted to historical records or even applied in a live streaming environment. Basically conformal predictions allow us to extract confidence intervals for those estimated curves derived from inference.

In general, conformal prediction may draw from conditional or marginal guarantees. Holdout data from training may have splits derived by traditional conventions like split conformal, full conformal, or the jackknife method. There has been several examples of recent work in the field that are of potential applicability to the time series setting, the speaker listed a few papers that might be worth further reading. A key distinction of the time series setting worth note is that the traditional conformal prediction underlying assumption of exchangeability needs to be lifted. Instead we rely on feedback provided after subsequent time steps, which evaluation may benefit from temporally correlated data streams (as would be available in natural settings).

The speaker proposed a new algorithm for this time series use case called SPCI (Sequential Predictive Conformal Inference), which enables general conformal prediction for time series data that does not adhere to the exchangeability assumptions, based on assuming the data is non-stationary but temporally dependent, exploiting feedback of residuals in the process. (For more see arXiv:2010.09107).

## Panel Discussion

- Victor Chernozhukov (MIT), Pedro Perona (Caltech), Larry Waserman (CNU)

Listed here are a few of the talking points addressed during the panel. There were good discussions, the blogger recommends viewing the recorded talk when it becomes available.

- Examples of conformal predictions applicability could include detecting bias in facial recognition.
- There is potential overlap between conformal predictions and the field of causal inference.
- Are we focussing too much on marginal coverage?
- Conformal prediction basically amounts to a form of quantile estimation.
- Do conformal methods impact any downstream use cases?
- What can asymptoptics analysis, which are for more general data conditions, teach us about conformal predictions?
- In causal inference when predicting a counterfactual without applying a treatment, an underlying model can be conformalized, e.g. by allowing for data dependence they were able to abandon requirements for an i.i.d. assumption.
- What options are there when the data isn’t expected to adhere to the exchangeability assumption? (Other than using a match structure, can we make use of ergodicity assumptions?)
- One particular type of departure from exchangeability is known as the random effects model, however as soon as you leave simple structures (like i.i.d.), you start to need to rely on asymptotic assumptions.
- Regarding the origins of conformal predictions, Vovk originally considered the online setting, subsequently a lot of work has transitioned to i.i.d. and related settings because they are more amenable to statistical analysis. Does the field need to return focus to the online setting? (Deserves more attention.)
- Note that the i.i.d. setting, despite limitations, has some subtle aspects of importance. It can be used to prove theorems. It is a good testing ground for what works. It is just easier to do things in cases of i.i.d.
- Conformal prediction relies on a validation set being representative of future conditions. How can we make it more robust to cases of distribution shift?
- There would be benefit to translating the methods of conformal prediction to a parametric model, however such an implementation would be quite different than what is being considered by this workshop.
- Note that conformal prediction and Bayesian ideas are on opposite ends of a spectrum. Conformal methods are pure frequentist statistics. These are different tools that work better in different types of settings.

## Spotlight Talks

**Probabilistic Conformal Prediction Using Conditional Random Samples***by Zhendong Wang, Ruijiang Gao, Mingzhang Yin, Mingyuan Zhou and David Blei*

This paper offered a new form of sample based conformal prediction, with benefits including guaranteed marginal coverage, improved sharpness, as well as being compatible with both implicit and explicit conditional generative models. Probabilistic Conformal Prediction (PCP) methods (from prior work) are relevant to various stages of learning like the data stage, modeling stage, calibration stage, and the prediction stage. Their extension is a form of high density PCP which filters out low frequency samples to identify high density regions of the distribution.

**Adaptive Conformal Predictions for Time Series***by Margaux Zaffran, Olivier Féron, Yannig Goude, Julie Josse and Aymeric Dieuleveut*

Modern techniques for conformal predictions may rely on the split conformal method. Recent work has sought to extend these methods to the time series settings, in particular Gibbs and Candes with methods for handling distribution shift and asymptotic validities for any distribution. Theoretical analysis has considered a metric known as ACI Length, this paper offered as extension called AgACI which is a type of adaptive wrapper around ACI. Benchmarking demonstrated that the method was found to be robust.

**Practical Adversarial Multivalid Conformal Prediction***by Osbert Bastani, Varun Gupta, Christopher Jung, Georgy Noarov, Ramya Ramalingam and Aaron Roth*

Multivalid conformal predictions (MVP) have stronger guarantees than marginal guarantees, including conditional guarantees associated with the group conditional or threshold calibrated coverage. The methods may be desirable for prediction tasks in cases of adversarial data. MVP threshold can score functions to get prediction sets with threshold calibrated coverage (e.g. to guard against the trivial 0.9 marginal coverage solution). The method relies on a randomization between average or upper bounded thresholds between each round of assessment. Speaker noted that additional followup work is coming soon.

**Approximate Conditional Coverage via Neural Model Approximations***by Allen Schmalz and Danielle Rasooly*

Restating the benefits of conformal predictions, deep networks may produce high accuracy point predictions but this may not be sufficient for interpretable results or where we need local updatability and uncertainty quantification. Speaker noted the differences between conditional and marginal coverage. Their approach to conditional coverage applies a weighting over the train set. At a high level, they derive an approximation based on an n-ball, relate the test point to calibration point distribution, construct a band where points outside of the band are ignored, match the approximation to a constraint feature, and optionally batch resample the calibration set. This allows constructing a prediction set for each label scenario relative to a test data point in inference.

**Confident Adaptive Language Modeling***by Tal Schuster and Adam Fisch*

This paper considered using an uncertainty quantification to modify the behavior of a deep network while controlling for the quality of inference output. In general, autoregressive generation, as is performed by large scale language models with many layers, can be computationally expensive. The goal of this work was to dynamically adjust the number of layers inspected for each generated token. To do so they sought to control for quality of a full generated sequence. Consider for example that some token sequences may produce different statements that relay the same meaning, like for example the difference between the order of terms around an “and” statement (“cats and dogs” verses “dogs and cats”). Part of the method relies on inspecting threshold of softmax activation, can also inspect hidden layers as an alternative. The result of the practice is an efficiency verses latency tradeoff, the speaker demonstrated an example with 3x speedup for autoregressive inference from a large language model.

**VaR-Control: Bounding the Probability of High-Loss Predictions***by Jake Snell, Thomas Zollo and Richard Zemel*

When choosing a predictor we want to be able to issue a guarantee around its loss distribution. One way to think about it is that conformal prediction seeks to control the quantile of a loss distribution as opposed to a point-wise expected loss estimate, as expected loss isn’t robust to difficult examples. The speaker suggests that instead of focussing on difficult examples it may be better to upper bound the potential risk associated with vast majority of examples. Their approach uses the empirical cumulative distribution function (CDF) of loss on the validation set to bound the quantiles.

## PAC Prediction Sets: Theory and Applications

- Insup Lee (Live Talk)

The tools of conformal predictions are expected to be of benefit for safety critical systems. The resulting statistical guarantees of uncertainty quantification may enable an added layer of trust to autonomous systems, ranging from robotic applications like drones and unmanned ships or also for automated aspects of healthcare practices. Consider the example of a self driving car in active deployment in context of other vehicles and pedestrians, which can be abstracted as an object avoidance in navigation setup. Neural networks on their own may not be verifiable in their assessment of safety risk, conformal predictions allow us to perceive and forecast and make semantic contexts that much more interpretable.

The subject of this talk, the PAC prediction sets, are intended as a resource for rigorous statistical verification of model performance as a replacement for prior empirical methods. The most common prior forms of model verification basically amounted to a sampling operation. One would sample a diverse set of inputs and infer safety guarantees from the collective performance. This kind of verification may sometimes fail, especially when there is a need to modify the decision process in some fashion as a result of the assessment. The essence of a PAC prediction set problem is to construct the smallest sets that adhere to a desired property, which is a kind of extension building on the conformal prediction quantization foundations.

As an approximate description, the PAC framing relies on a binary classification learning algorithm to distinguish between examples for their inferred risk lying above or below some threshold. One can expect that once derived, such a PAC model will with high probability work even on future as yet unseen examples, presuming relying on similar assumptions as the conformal predictions framing. This form of set aggregation based on a risk threshold serves as a kind of generalization bound, and by paring the aggregations into minimal quantity of examples for the threshold it makes it easier to tell where a ground truth resides.

Of course as with any learning algorithm, there are obstacles in the context of covariate shifts, which refers to cases where the test time data distribution doesn’t align with those properties found in training data (like a self driving car being deployed in a new city). By comparison to the conformal calibration set, an importance weight associated with a distribution drift can be approximately known. There is existing prior work on the conformal prediction setting in cases of covariate shift which rely on such added assumptions like known importance weights. The PAC approach appears to relax some assumptions on importance weight intervals by relying on a probabilistic discriminator algorithm to choose a maximally conservative importance weight, with the PAC set then constructed by sampling.

With respect to applications, consider that in deep neural network inference there can often be a kind of tradeoff between variations that either speed up inference or make the model more accurate. By relying on the PAC method, a conditional can be applied to balance between such variations in a manner that adheres to a desired uncertainty threshold. Another way to think about such balancing could be associated with tradeoffs between safety and latency.

As with most conformal prediction methods, the applications are fairly broad with respect to the various conventions and applications of neural networks. In other words, this isn’t just applicable to classification, regression, or object detection. Think reinforcement learning, meta learning, anomaly detection, all kinds of stuff. The speaker expects the PAC method may especially be of benefit for safety critical applications in fields like robotics or healthcare where there is some need for software verification.

## Conformal prediction beyond exchangeability

- Rina Barber (Live Talk)

This closing talk rounded out some of the earlier high level discussions by digging much deeper into minutia of operations and assumptions applied in the many variations on conformal predictions under development both in software and in theory. As a result, a disclaimer is probably appropriate as this blogger was being introduced to much of this material for a first time so there may be subtle inaccuracies found herein. For example, I am a little unclear on the use of the term “residuals” as used here, I got the impression it might refer to the variance of inference output from perturbations applied surrounding a data point but am not positive. After the introduction paragraphs my notes were less coherent so will compromise by capturing below a few snippets of talking points, if any of these topics are interesting to a reader an obvious next step is to check out the recorded talks :).

One of the core assumptions common to work in the conformal predictability setting noted earlier in this writeup was exchangeability between the test time data verses the data found in training. The use of a partitioned calibration set from training may actually be applied interchangeably with the exchangeability assumption, in other words if we have a calibration set it is easier to lift the exchangeability assumption from our derivations as we can rely on e.g. comparisons of model performance to the holdout set.

When we talk about deriving prediction sets (like those noted in the PAC writeup above), there are a few methods available. A naive approach is to use trained residuals. More advanced settings may apply a parametric model, rely on smoothness assumptions, or take advantage of training with cross validation. Using a holdout set, one workflow could be to fit a model, compute the holdout residuals, and calculate a prediction interval. If we want to try these things without a holdout set, exchangeability assumptions come into play.

A few talking points in closing where the blogger’s notes did not capture the full narrative:

- In “split conformal prediction”, residuals are exchangeable with each other.
- Split conformal can be viewed as a special case (assumes both i.i.d. and exchangeability).
- Full conformal is a variation that doesn’t require a holdout set.
- When you have overfitting, all residuals get smaller for both train and test sets.
- In regression we need an added assumption beyond exchangeability of treating data symmetrically for reasons more subtle.
- Remember that y=x curve? When the demonstrated curve is not flat that suggests a violation of exchangeability.
- Using weights appears to violate exchangeability from quantile calculation, so by trying to fix the problem we introduce more problems.
- Non exchangeable conformal prediction (nexCP) method uses a symmetric algorithm case which fixes issues surrounding distribution drift because weighting puts more weight on recent data and less on past data.
- For the case where we have both distribution drift and the nonsymmetric modeling of an autoregressive model, we can apply a partial stochastic swap between test and train data. (Or at least confirm that the distribution of original data is nearly the same.)

*Acknowledgements: A thank you owed to the workshop organizers for stimulating discussions and content. Thank you also for tolerating this blogger’s attention.*

## References

Angelopoulos, A., Bates, S. **A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification**, 2022. URL https://arxiv.org/abs/2107.07511.

Angelopoulos, A., Bates, S., Li, S. Y., Ramdas, A., and Tibshirani, R. **Workshop on distribution-free uncertainty quantification**, 2022. URL https://sites.google.com/berkeley.edu/dfuq-22/home?authuser=0.

Angelopoulos, A. N., Bates, S., Cande`s, E. J., Jordan, M. I., and Lei, L. **Learn then Test: Calibrating Predictive Algorithms to Achieve Risk Control**. *arXiv e-prints*, art. arXiv:2110.01052, October 2021.

Barber, R. F., Candes, E. J., Ramdas, A., and Tibshirani, R. J. **Conformal prediction beyond exchangeability**, 2022. URL https://arxiv.org/abs/2202.13415.

Bastani, O., Gupta, V., Jung, C., Noarov, G., Ramalingam, R., and Roth, A. **Practical adversarial multivalid conformal prediction**, 2022. URL https://arxiv.org/abs/2206.01067.

Ding, T., Angelopoulos, A., and Bates, S. **An empirical bayes approach to class-conditional conformal inference**, 2022.

Gan, L., Zheng, L., and Allen, G. I. **Inference for interpretable machine learning: Fast, model-agnostic confidence intervals for feature importance**, 2022. URL https://arxiv.org/abs/2206.02088.

Jin, Y., Ren, Z., and Cande`s, E. J. **Sensitivity analysis of individual treatment effects: A robust conformal inference approach**, 2021. URL https://arxiv.org/abs/2111.12161.

Joy, T., Pinto, F., Lim, S.-N., Torr, P. H. S., and Dokania, P. K. **Sample-dependent adaptive temperature scaling for improved calibration**, 2022. URL https://arxiv.org/abs/2207.06211.

Li, S., Park, S., Ji, X., Lee, I., and Bastani, O. **Towards pac multi-object detection and tracking**, 2022. URL https://arxiv.org/abs/2204.07482.

Nguyen, V., Farfade, S., and Hengel, A. v. d. **Confident sinkhorn allocation for pseudo-labeling**, 2022. URL https://arxiv.org/abs/2206.05880.

Prinster, D., Liu, A., and Saria, S. Jaws: **Predictive inference under covariate shift**, 2022. URL https://arxiv.org/abs/2207.10716.

Shafer, G. and Vovk, V. **A tutorial on conformal prediction**. *J. Mach. Learn. Res.*, 9:371–421, jun 2008. ISSN 1532–4435.

Schmaltz, A. and Rasooly, D. **Approximate conditional coverage via neural model approximations**, 2022. URL https://arxiv.org/abs/2205.14310.

Schuster, T., Fisch, A., Gupta, J., Dehghani, M., Bahri, D., Tran, V. Q., Tay, Y., and Metzler, D. **Confident adaptive language modeling**, 2022. URL https://arxiv.org/abs/2207.07061.

Snell, J., Zollo, T., and Zemel, R. **Var-control: Bounding the probability of high-loss predictions**, 2022.

Taquet, V., Blot, V., Morzadec, T., Lacombe, L., and Brunel, N. **Mapie: an open-source library for distribution-free uncertainty quantification**, 2022. URL https://arxiv.org/abs/2207.12274.

Vovk, V., Gammerman, A., and Shafer, G. ** Algorithmic Learning in a Random World**. Springer-Verlag, Berlin, Heidelberg, 2005. ISBN 0387001522.

Wang, Z., Gao, R., Yin, M., Zhou, M., and Blei, D. M. **Probabilistic Conformal Prediction Using Conditional Random Samples**. *arXiv e-prints*, art. arXiv:2206.06584, June 2022.

Xu, C. and Xie, Y. **Conformal prediction for time series**, 2020. URL https://arxiv.org/abs/2010.09107.

Zaffran, M., Dieuleveut, A., Fe ́ron, O., Goude, Y., and Josse, J. **Adaptive conformal predictions for time series**, 2022. URL https://arxiv.org/abs/2202.07282.

For further readings please check out the Table of Contents, Book Recommendations, and Music Recommendations. For more on Automunge: automunge.com