Conformal Prediction of Customer Segment Memberships by Customer Type; Uncertainty, Quantified and Unquantified

TL;DR? Here’s the BLUF:

Conformal prediction methods can provide statistically valid certainty guarantees when predicting class (e.g., segment) memberships, or other kinds of outcomes, that machine learning models are often used for predicting. Depending on how these methods are used, the quality of the results can depend on characteristics of the objects (e.g., customers) that predictions are for. When such characteristics consist of a priori known types or groups, investigating the quality of conformal prediction results for them is very straightforward.

Conformal prediction is for quantifying uncertainty. Some kinds of uncertainty that decision-makers face may not be quantifiable.

 


This conformal prediction niblet is a follow up to a previous post “Conformal Prediction: Simple Model-Agnostic Prediction with Statistically Guaranteed Error Rates:”

https://lomabuena.blogspot.com/2023/12/conformal-prediction-simple-model.html

This previous post described calculating prediction sets using estimates from a gradient boosting classifier trained to predict customer memberships in four segments. See this previous post for more details.

Segment Membership Conformal Prediction for Customer Types

This post highlights a very simple point that might be obvious to many: conformal predictions that guarantee marginal (average) coverage of “ground truth” categories (in the present context, “true” segment labels) may not provide the same coverage for all objects, e.g., for all customers, or for different types of customers. There are easys way to examine and to guarantee coverage for specific a priori known types of customers.

In the last post, I used a certainty criterion of 67%, or two to one odds of coverage, when predicting customer segment memberships. The empirical estimate of coverage using 627 test customers was approximately 67.3%. It’s possible that coverage may not be 67% for all customers: the statistical guarantee is with respect to marginal, or average, coverage. Some customers may have lower coverage, and others, higher.

One of the customer features not used when training the classifier was “Work_Experience,” which apparently is a measure of years of work experience, from zero years upward. There are missing values in the data. For the purposes of the present example, I recoded this feature into a Boolean indicator of “one or more known years of experience,” which I refer to as "known work history" in the following.  We can examine marginal coverage based on this feature.

Going back to the results in the previous post, we observed the following distribution of prediction set sizes for the 627 test customers. These customers were not used for model training and validation, or for calibration:

Distribution of Predictive Set Sizes 

for 627 Test Customers

3    353
2    205
1     57
0     12

The leftmost column is set size in number of segments covered. The rightmost column is number of customers.  Note that 0 indicates customers for which the classifier is uninformative regarding segment membership.

We observed that the empirical coverage of the "true" labels obtained for these 627 customers was approximately 67.3%, close to the 67% certainty requirement we wanted.

 If segment-specific advertising and promotional tactics had been developed for some or all of the four segments, how might, or how should, these results affect decision-making about which new customers to actually target with those tactics? Targeting errors could result in opportunity costs.  For that matter, how might decision-making be affected if calibrated probability estimates for segment membership were available instead of prediction sets?

In any case, it is possible that the results for customers with known work history, and those for customers without it, might differ.   Here's how set sizes are distributed across the test customers depending on whether work history is known or not.

Known: false true size: 3 130 223 2 80 125 1 22 35 0 6 6

 

The distributions look pretty similar, don't they?  A  \chi^2 computed on this table is approximately 1.039 with 3 df. 

How about empirical coverage conditional on known work history? The empirical coverage for customers with known work history is about 68%, and for those with no known work history, approximately 66%. 

If larger differences in empirical coverage are observed, a simple thing to do is to select a conformity score quantile criterion, or threshold, value for each a priori known group of customers that provides the required degree of certainty for each group. You need enough data for each group to do this, of course. How much is enough? It has been suggested by Angelopoulos & Bates (2022) that 1,000 should be enough based on the notion that whether sets cover "ground truth" values can be considered to be beta-distributed. But you might try fewer.

What if coverage varies based on segment membership? It's possible. You can't know whether it might for truly new customers because segment membership is what needs to be predicted, it's not known. This is a conformal "horse of a different color." A way to take the possibility into account and to achieve a coverage guarantee is to calculate a quantile threshold for each of the segment membership labels using the desired coverage certainty, e.g., 67%. Then, use the most conservative threshold value for generating prediction sets for new customers. Using this approach, the coverage guarantee for each segment membership label value is at least 67%, or greater than or equal to two to one odds. Variations on this procedure has been called "class conditional" conformal prediction by Shi et al. (2013) and others. Shi et al.( Ibid, p. 237) used a weighted average of scores obtained from predicting each class label. The choice of coverage certainty level (67%, 90%, whatever) for any particular application is yours and your stakeholders, of course. And you need enough data to calibrate predictions.

Uncertainty: Quantifiable, and Unquantifiable 

Conformal prediction has to do with quantifying uncertainty.   There are different ways to quantify uncertainty, and decision makers likely vary in their preferences and in their abilities to make good use of uncertainty quantifications. It seems likely that there is no particular method that can always be the best across all possible contexts and use cases.  Decision-maker preferences and their understanding are worth taking into account when designing decision support methods, products, and workflows.

An economist might opine that uncertainty needs to be quantifiable in order to risk adjust for it, to "price" it.  Some decisions involve uncertainty that defies quantification, uncertainty that may derive from things like expectations, or corporate or personal values.  

 Some Resources

Angelopoulos, A. & Bates, S. (2022) “A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification.” https://arxiv.org/abs/2107.07511

Shi, R., Soon Ong, C., Leckie, C. (2013) "Applications of Class-Conditional Conformal Predictor in Multi-Class Classification." 12th International Conference on Machine Learning and Applications, 235-239, IEEE Computer Society.

And, here's a worthwhile Python-based conformal prediction book:

Manokhin, V.  "Practical Guide to Applied Conformal Prediction in Python." Birmingham UK: Packt Publishing, 2023.  (Also available on oreilly.com.)


Comments

Popular posts from this blog

Are you Certain About How Uncertain You Should Be?

Conformal Prediction: Simple Model-Agnostic Prediction with Statistically Guaranteed Error Rates