Risk Controlled Conformal Prediction of Outcomes for Customers, Patients, or Other Entities


TL; DR?  Here's the BLUF:

Risk-controlled conformal prediction methods subsume conformal prediction methods as a special case.  They allow controlling for the risk of specified types of prediction errors.   Single or multiple kinds of risk can be accommodated.  Like conformal prediction methods, risk-controlling methods are model agnostic, and they can provide finite sample statistical certainty guarantees. 

A simple, and mostly conceptual, description of risk controlled conformal prediction, follows.


As noted elsewhere and by many, conformal prediction methods can provide a statistically valid, finite sample certainty guarantees about whether predictions for individual new cases, contain, or cover, the "true" quantity or value to be predicted, the "ground truth."  The methods are model agnostic: they can be applied when all sorts of predictive procedures are used (e.g., Vovk et al. 2022; Angelopoulos & Bates 2022).

In addition to conformal prediction coverage guarantees, it's possible to control the risk of other kinds of prediction  errors.   You can, for example, in the case of classifiers, control the risk of true positives, true negatives, other types of prediction errors, or even combinations of different types of prediction errors, that can be expressed as a loss function. You can also control the risk of  regression prediction errors.  (Angelopoulos et al. 2022, 2023a).  This notion about managing prediction risk can be useful in many prediction use cases.

Angelopoulos et al. (Ibid.) describe use cases involving controlling single risks and controlling multiple risks.  To keep things as simple as possible, here, what follows is a not-comprehensive and simple exploration of this idea of risk controlled conformal prediction involving a single kind of risk and predictive classification.  As is the case with conformal prediction, risk controlled conformal prediction is possible not just for classification problems, but also for regression and other sorts of prediction problems.

Let's suppose you have trained a binary classifier to predict the responses of individual new cases.  Your binary classifier model may be any kind that can be used to make predictions using new data: random forest, neural network, logistic regression, whatever.  Maybe you trained your classifier for predicting individual consumers' responses to a promotional marketing offer, for predicting future fraudulent warranty claims, or for predicting patient outcomes.   

Suppose also that your marketing promotion stakeholder wants to control the risk of false positive predictions, because making the offer is costly.  Or, they may want to limit false positives while also only making predictions for new customers who aren't too different from the customers whose data you used to train your classifier.  Or, maybe your warranty issuing stakeholders want to limit the risk of false negative predictions of fraudulent claims.  

Before you trained and validated your classifier you randomly set aside a data set for "calibration" purposes.  Your calibration data consisted of X's (features) and Y's (target values, typically 0 or 1), like the data you used for training and validating.  

Angelopoulos et al. (Ibid.) proposed extending the notion of conformal prediction (see here for my very simple description of conformal prediction) to limit different kinds of risk based on the type of prediction error to be controlled.  Their basic idea is to identify a function for post-processing predictions that guarantees that the risk of prediction error at or lower than an acceptability threshold.

The approach they propose is to use calibration data to examine a family of parameterized prediction post-processing functions that provide varying degrees of risk control with respect to a specific definition of loss.   The value of the parameter determines the degree to which predictions are conservative, i.e., how likely they are to include "ground truth."  From these functions, one that satisfies the desired certainty of risk control is used for predicting each new case. 

Suppose your hypothetical marketer wants to limit the risk of a false positive when predicting a new customer's response to the promotion.  Loss defined as the expected false positive rate has a finite upper bound of 1.0, and the risk to be controlled is monotonic w.r.t. this loss.   You and this marketer might want want an 80% certainty guarantee that the false positive risk when making a prediction for a new customer will not exceed some desired probability, like 0.01. 

Following how the authors (Ibid.) express the problem to be solved1, you need to find a post-processing function \Gamma_{\lambda}, where \lambda is the risk controlling parameter, such that the probability of the false positive risk, R(\Gamma_{\lambda}),  \gamma, is \le 0.01, with a statistical certainty guarantee of at least 1-\delta, or 0.80. 

The authors describe different ways to find \Gamma_{\lambda}'s that satisfy the desired risk and certainty requirements.   Their Learn Then Test (LTT) procedure (Angelopoulos et al. 2023b) is a straightforward method for selecting values of \lambda that consists of hypothesis testing whether R(\Gamma_\lambda) is \gt \gamma across of range of \lambda values while taking into account the experiment-wise, or family-wise error rate from making multiple comparisons.  The \lambda values corresponding to hypothesis rejections that risk is not controlled indicate the \Gamma_\lambda post-processing functions that provide the desired guarantees of risk control.

To wrap up this simple introduction to risk-controlled prediction, following are a few last things to note.   First, for any particular case, there may be none, one, or more than one, risk-controlling \Gamma_\lambda functions, depending on the quality of the classifier, and on the values of \gamma and \delta chosen.

Secondly, different kinds of concentration inequality probabilities can be used for the aforementioned hypothesis testing which will provide different degrees of prediction conservativeness.  The same is true for different methods for controlling the experiment-wise/family-wise error rate.  

\lambda can be multidimensional if there is more than one type of risk to be controlled.  Risks to be controlled can be conditional on other risks to be controlled.  An example is where class predictions are to be made only for a new datum that is not too different from the data used to train a classifier.

As a general approach, risk control can be applied when making other kinds of predictions, like where the trained model is a regression model. 

For more interesting and less basic examples of risk-controlled prediction, see Anastasios Angelopoulos's blog post about Distribution-Free , Risk-Controlling Prediction Sets.

Last, but not certainly no least, for an extensive collection about conformal prediction, and a handy book about it, see Valeriy Manokhin's GitHub page at Awesome-Conformal-Prediction.

Resources

Angelopoulos, A. & Bates, S. (2022) “A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification.” https://arxiv.org/abs/2107.07511

Angelopoulos, A., Bates, S., Fisch, A., Lei, L. & Schuster, .T. (2023a) "Conformal Risk Control.https://arxiv.org/abs/2208.02814v3

Angelopoulos, A., Bates, S., Candès, E., Jordan, M. & Lei, L. (2023b) "Learn Then Test: Calibrating Predictive Algorithms to Achieve Risk Control." https://arxiv.org/abs/2110.01052v5

Vovk, V., Gammerman, A. & Shafer, G. “Algorithmic Learning in a Random World.” Springer Nature Switzerland AG, 2022.


1 I'm mostly using the authors' notation in what follows, here.

Comments

Popular posts from this blog

Conformal Prediction of Customer Segment Memberships by Customer Type; Uncertainty, Quantified and Unquantified

Are you Certain About How Uncertain You Should Be?

Conformal Prediction: Simple Model-Agnostic Prediction with Statistically Guaranteed Error Rates