How to create ROC (Receiver Operating Characteristic) Curve?

Michael C.H. Wang
GLInB
Published in
4 min readJun 14, 2022

Refresh the memory

Remember in the previous blog “ Acceptance sampling plan and Classifier” we talked about the analogy between the sampling plan and the classifier which both has the risk for mis-classification. By solving below equations set simultaneously, operating characteristics curve (OC curves) could be drawn for the correlation between acceptance probability and lots defect level by the number of samples required n and c, the allowable number of failures to meet the requirement:

However, we haven’t address in detail how the “Receiver Operating Characteristics Curve” (ROC Curve) formulates from mathematical perspective and hands-on to get one. We expect to explore this curve here on different dimensions.

The ROC curve was initially developed in WW2 with radar technique to detect enemy objects in the battlefield, and was soon introduced to many subjects related to decision making science. Remember in “ Soli and Radarnet”, there was an introduction of “ Radar Technology” and by adjusting a “threshold” for power of received wave, whether the scattering objects would present or not on radar display would be decided.

Example: ROC curve of three predictors of peptide cleaving in the proteasome . Source from Wiki

Unlike OC curve, the ROC curve does not elaborate the direct functional relationship between true positive rate and false positive rate, in fact it shows a “path” plot and more important, the value of “Area Under Curve” (AUC) which represents a metric of the classifer under the decision threshold

The two axes of this pictorial analysis are defined as below respectively:

Y (True Positive Rate, TPR)= TP/(TP+FN)

X (False Positive Rate, FPR)= FP/(TN+FP) as Fig. 1 below

Fig. 1

And we can transform X direction to similar form of Y direction by X=1-TN/(TN+FP) as illustration Fig. 2 below:

Fig. 2

Now we assume a simple case like below:

A test case with 14 trials for binary classification. We assume there are 9 positive samples and 5 negative ones and after training by features we have a model. We come out the prediction probability from the model for each sample. The perfect case is at that threshold if we sort the samples by probability in descending way and the positive ones are exactly all above the threshold.

So if we count the result by drawing on the TP/TN chart above and follow a rule that:

  1. Starting from Y=0, and count these samples top down, whenever there is a “True Positive”, move upward by one unit, in the perfect case, since there is no misclassification for positive samples, it will evetually be 100% as 9/9;
  2. Also split the X axis by the number of negative samples. There would be 5 steps and the complete path in this case would be a square and the AUC would be 1. See below Fig. 3.
  3. If there is a misclassification, simply move horizontally instead of moving upward;
  4. As it’s accumulated so no “reverse path” allowed and eventually a complete envelop will form as below Fig. 4. The AUC would be 0.756
Fig. 3
Fig. 4

Python scikit-learn library also provides codes and data example to draw ROC curve as link here. However it’s not so comprehensive and you can refer to another blog where a simple piece of codes available for downloading.

Originally published at http://glinb.com on June 14, 2022.

The mission of GLInB is to bring most value by virtualization of your supply chain quality function to fit for challenges in today’s business environment.

Please visit us at http://glinb.com for more information.

--

--

Michael C.H. Wang
GLInB
Editor for

❤️‍🔥Passionate in blending QA and ML. Enjoying in problem solving.🔍🔧 Co-founder of GLInB. 📝Bio at Michael Chi Hung Wang | LinkedIn