Accuracy & Related Metrics Explained via “This Is Fine” Meme

Robert Schell
5 min readMay 31, 2022

If you’re like me, you may have been confused at first (or even still) by accuracy and its related metrics, including: precision, recall, and F1 score, when used to assess classification models. These metrics are used ubiquitously for all types of classification problems and are extremely important to understand in order to accurately evaluate your model. In a simplistic view, it all boils down to how 4 variables are placed (or not placed) into equations for the specific metric. Those 4 variables being:

  1. True Positive (TP): If the model predicts the positive class and the actual target is in the positive class, then you have a TP.
  2. True Negative (TN): If the model predicts the negative class and the actual target is in the negative class, then you have a TN.
  3. False Positive (FP): If the model predicts the positive class and the actual target is in the negative class, then you have a FP.
  4. False Negative (FN): If the model predicts the negative class and the actual target is in the positive class, then you have a FN.

A quick note about positive class vs. negative class before moving on. These are subjective terms and don’t mean anything about good vs. evil. They are really based on what you want your model to predict. In the example I will be using, we’ll consider “fire” to be the positive class and “no fire” (i.e. “fine”) to be the negative class.

Now that we have the 4 variables defined, we can look at one of the panels from the “This Is Fine” dog meme (full panel:http://gunshowcomic.com/648, thanks KC Green!). I will start with the original panel to discuss the False Negative (FN) variable. The remaining panels have been modified by me (poorly) using the opensource graphics editor, GIMP, to illustrate the other 3 variables.

False Negative

An example of a “False Negative” outcome by our model (meme dog).

This is the iconic meme that went viral several years ago. I remember seeing this image and immediately thinking of a False Negative, FN. Our meme dog is saying, “This is fine,” when it clearly is not fine and flames are beginning to engulf him.

False Positive

An example of a “False Positive” outcome by our model (meme dog).

This is another example of a false prediction. However, now our meme dog is saying, “This is fiRe,” when there is none. This is a False Positive, FP — also please disregard my lack of graphics editing experience :)

True Negative

An example of a “True Negative” outcome by our model (meme dog).

This is an example of a true prediction where our model has correctly predicted that there is no fire, i.e. “This is Fine” is appropriate for the situation. Here, since we are considering “no fire” (i.e. “fine”) as the negative class, and, this was a true prediction, this is considered a True Negative, TN.

True Positive

An example of a “True Positive” outcome by our model (meme dog).

This is another example of a true prediction where our model has correctly predicted that there is a fire, i.e. “This is FiRe” is appropriate for the situation. Here, since we are considering “fire” as the positive class, and, this was a true prediction, this is considered a True Positive, TP.

We went in reverse order from the 4 variables listed because I wanted to cover the “false” outcomes first as they can be most detrimental to you model. Oftentimes, the false predictions require further understanding to why the model made the wrong prediction for the given situation.

Now that we have those 4 variables defined, we can get to the metrics and we will start with accuracy:

Accuracy

Accuracy = (TP + TN) / (TP + TN + FP + FN)

You can see that the Accuracy metric is comprised by all the true predictions divided by the total of all predictions from the model (true and false). This metric uses all of the 4 variables and is helpful to understand how often your model gets things right, however, it has some drawbacks. Specifically, for imbalanced datasets where one class greatly outnumbers another class, the Accuracy metric may not be providing the entire story on the “false” outcomes. In these cases, a model may still be very accurate because it is mostly predicting the majority class. Let’s look at different metrics to see how they may perform better in different situations.

Precision

Precision = TP / (TP + FP)

As you can see, the Precision metric is only using the 2 variables associated with the positive class, TP and FP. Therefore, notice that the denominator is just the total predicted positive classes. The Precision metric is widely used for assessing classification models, most especially if there is high cost associated with False Positives, FP, — e.g. when our meme dog says there is fire when there is not (second meme dog image above).

Recall

Recall = TP / (TP + FN)

As you can see, the Recall metric is also only using the 2 variables, TP and FN. Therefore, notice the denominator is just the total true positive classes (note that FN would be a false, negative prediction on a positive class). The Recall metric is widely used for assessing classification models, most especially if there is high cost associated with False Negative, FN, — e.g. when our meme dog says there is not fire when there clearly is (first meme dog image above)!

F1 Score

F1 = 2*((Precision*Recall)/(Precision + Recall))

As you can see, the F1 Score metric has combined elements of both Precision and Recall into a single metric. Therefore, as you may have guessed, the F1 Score metric is used as a more balanced measure when both the False Positive, FP, and the False Negative, FN, predictions are costly.

Please note that Accuracy is the only metric of those discussed that uses True Negative, TN, in its calculation. Accuracy is still a very valuable metric and the additional metrics are complementary to each other.

Thanks for reading and I hope this has been helpful to begin understanding some of these classification metrics better. We’ve briefly discussed 4 metrics, when to use them, and the 4 variables that constitute those metrics. In practice, it is probably best to use many metrics, including and beyond those discussed here, in order to truly understand your models and their predictions.

--

--