What is more important: Prediction Probabilities or Prediction Accuracy ?

Avinash Negi
CARS24 Data Science Blog
3 min readJan 20, 2023

Most of us say it is prediction accuracy and that’s very much true as at the end the model is judged based on its accuracy in classifying the right class for the observation. Also, this can be tweaked based on the prediction probabilities.

Let me show and then ask you a question: Do you think it is a good binary classification model?

Don’t you feel that it’s the best bimodal probabilities we can get to classify the observation as 0 and 1.

We are sure, the answer is YES!! But for us it was not, as these probabilities are getting used to control the flow of the cars in our inventory and with such narrow ranges, we will not be able to control anything.

This problem could have been because of 2 reasons i.e., one due to the model or due to the number of predictors we were using. Actually it’s because of both, as we have decided to have less number of predictors for this classification problem and obviously the model has inherent problems if we reduce the features to a number.

Note: The maximum margin methods such as boosted trees push probability mass away from 0 and 1 yielding a characteristic sigmoid shaped distortion in the predicted probabilities. Models such as Naive Bayes, which make unrealistic independence assumptions, push probabilities toward 0 and 1. Other models such as neural nets and bagged trees do not have these biases and predict well calibrated probabilities.

This problem has been addressed by many researchers as well as by sklearn. Calibrated classifier model is the answer to solve this problem. In scikit-learn words:

Calibrating a classifier consists of fitting a regressor (called a calibrator) that maps the output of the classifier (as given by decision_function or predict_proba) to a calibrated probability in [0, 1]. Denoting the output of the classifier for a given sample by fi, the calibrator tries to predict p(yi=1|fi).

The samples that are used to fit the calibrator should not be the same samples used to fit the classifier, as this would introduce bias. This is because performance of the classifier on its training data would be better than for novel data. Using the classifier output of training data to fit the calibrator would thus result in a biased calibrator that maps to probabilities closer to 0 and 1 than it should.

After using the same calibrating method, we were able to calibrate the accuracies in the right probabilities distribution. Below distribution shows the probabilities of the calibrated model for the same observations.

Conclusion:

Every problem requires its own analysis, though Prediction Accuracy may be sufficient for problems where standard deviation of the estimation likelihood does not matter but when it does, Prediction Probability is the way. Likelihood estimates involving boosted trees, Naive Bayes etc. are often prone to the problem of being concentrated toward the extremes, i.e. monotonic distortions, but by using calibrated models ( sigmoid or isotonic ) can help in reducing the bias and predicting well-calibrated probabilities.

Authors:

Avinash Negi, Data Scientist @ CARS24, Jitesh Khurana, Lead Data Scientist, Abhay Kansal, Staff Data Scientist @ CARS24

References:

  1. Scikit-learn Probability Calibration
  2. Predicting Good Probabilities With Supervised Learning

--

--