Trueface Tutorials: Understanding ROC Curves

What are they? How do we interpret them? How do you choose an optimal threshold for your use case?

Chinmay Jog
Trueface
5 min readNov 22, 2019

--

Trueface Model TFV3 ROC for the CFP dataset.

The Receiver Operating Characteristics (ROC) curve is an evaluation metric for a binary classifier, which helps us to visualize the performance of a facial recognition model as its discrimination threshold changes. In binary classification, we make a binary decision using a continuous probability value of a given sample belonging to one of the two classes. We want to find the optimal threshold on this probability value. Throughout this post, we refer to this value when we say threshold.

We need to keep in mind, that for any non-trivial problem, no machine learning algorithm is perfect. For this reason, we first want to evaluate the algorithm’s performance, and then tune it for an application.

In this blog post, we aim to accomplish two things:

  1. Understand the data that makes up an ROC curve.
  2. Gain an appreciation for how to interpret and use an ROC curve in order to select a threshold that is appropriate for the real-world deployment of a face recognition model.

For the sake of this blog post, let’s introduce a fictional person named John. John is employed by a company that has deployed facial recognition for access control as part of the office security system. As part of the deployment, John is tasked with setting the optimal threshold of the machine learning model that the security system will use to allow or deny access to employees. How will John know which threshold is best for his use case?

To answer the question, let us first understand ROC curves with some definitions of common technical terms; namely True Positive Rate (TPR) aka Recall and False Positive Rate (FPR).

True Positive Rate (TPR) — the probability of detection

False Positive Rate (FPR) — the probability of false alarm

Let us consider a model with positive class ‘John’ and negative class ‘Not John’. The diagram below (also called a confusion matrix) helps us understand the definitions of True Positives, True Negatives, False Positives, and False Negatives.

Example of classification terms used in context of face recognition.

In the above scenario the prediction and ground truth (or actual outcome) can result in one of four outcomes:

True Positive (TP) — the model identifies John as John.
False Positive (FP) — the model identifies another person as John.
False Negative (FN) — the model identifies John as someone else.
True Negatives(TN) — the model identifies some other person as ‘Not John’.

With these definitions, we can define TPR and FPR as:

TPR = TP/(TP+FP)

FPR = FP/(FP+TN)

With these terms defined, we can now understand an ROC curve like the one shown below. This curve describes the performance of Trueface’s face recognition model on the LFW dataset. For reference, you can view the ROC curves for all of Trueface’s models here.

Performance of Trueface.ai’s face recognition model

How to choose a good classification threshold for your model.

Typically we want a model that has a high TPR and low FPR. But there is a trade-off associated with these metrics as can be seen in the above plot. As we vary our threshold to increase TPR, FPR increases along with it. Businesses will have different requirements for their use cases.

Let's suppose that there has been a break-in at John’s office, and John has to identify the robber by matching his face captured in CCTV footage to an embedding of the face in a dataset. In this case, the system must not miss a single face that comes close to that of the robber (this means a TPR of 1.0). It is okay if there are some false positives, which John can ignore upon closer inspection. Below we can see the optimal threshold for this case:

Threshold for a TPR of 1.0 with lowest FPR

As we see, John should keep a threshold of 0.7 to keep TPR = 1.0 while keeping FPR as minimum as possible.

At the same time, look at the point on the graph where TPR = 0.999. The corresponding FPR = 0.35. This means for every 1 in 1000 positive samples that you miss, you reduce the false positives by ~40%.

In a second scenario, let’s say John is ready to deploy face recognition for access control for the entire company. In this use case, we need to limit the chance of the model identifying an unauthorized person as John or another employee and granting them access. This means it is crucial for the system to have an FPR of 0.0. It is acceptable if rarely John has to show his face to the system twice or thrice to get access to his office. What should our optimal threshold be now?

Threshold for an FPR of 0.0 with highest TPR

As we see, 0.955 is a good choice of threshold which gives us an FPR of 0.0 and keeps TPR very close to 1.0.

Just as in our two examples, businesses will have different use cases and these use cases will have differing optimum thresholds.
To summarize the post,

  1. Decide on the relative cost of one false positive to one true positive (the cost of identifying someone else as John vs identifying John as John).
  2. Look at the ROC curve pick the point that maximizes the profit, and minimizes the loss (correctly identifying John as John and not identifying someone else as John)
  3. Take the threshold value of that point and use that to convert the probability output of the ML algorithm to a binary decision.

We hope that by reading this, you have a better understanding of how to interpret ROC curves, the tradeoffs of different thresholds, and how to select the best threshold based on your use case.

--

--