“Sensitivity and Specificity in ML: A Practical Guide”

Pasquale Di Lorenzo
3 min readDec 29, 2022

--

This article is part of the series :

Getting Started with Machine Learning: A Step-by-Step Guide

Sensitivity and specificity are important metrics in machine learning that help to evaluate the performance of a model. Sensitivity, also known as the true positive rate, measures the proportion of positive cases that are correctly identified by the model. Specificity, also known as the true negative rate, measures the proportion of negative cases that are correctly identified by the model.

Both sensitivity and specificity can be calculated using a confusion matrix, which is a table that compares the predicted outcomes of a model to the known truth. The confusion matrix has two rows and two columns when there are only two categories to choose from. The rows represent the predicted outcomes and the columns represent the known truth.

The top left-hand corner of the confusion matrix contains the true positives, which are cases that were correctly predicted to be positive. The bottom right-hand corner contains the true negatives, which are cases that were correctly predicted to be negative. The bottom left-hand corner contains the false negatives, which are cases that were incorrectly predicted to be negative. And the top right-hand corner contains the false positives, which are cases that were incorrectly predicted to be positive.

To calculate sensitivity, we use the formula: sensitivity = true positives / (true positives + false negatives). This tells us what percentage of positive cases were correctly identified by the model.

To calculate specificity, we use the formula: specificity = true negatives / (true negatives + false positives). This tells us what percentage of negative cases were correctly identified by the model.

In some cases, we may have more than two categories to choose from. In this case, the confusion matrix will have more rows and columns. To calculate sensitivity and specificity in this scenario, we simply replace the values in the formulas with the corresponding values from the confusion matrix.

In conclusion, sensitivity and specificity are important metrics that help us to understand how well a model is performing at identifying positive and negative cases. These metrics can be calculated using a confusion matrix and are useful for comparing the performance of different models.

Example:

Imagine that we are building a machine learning model to predict whether or not a person has a certain disease. We have a dataset of 1000 patients, 500 of which have the disease and 500 of which do not. We train our model and then use it to make predictions on a separate test dataset of 200 patients.

After evaluating our model, we obtain the following confusion matrix:

To calculate sensitivity, we use the formula: sensitivity = true positives / (true positives + false negatives). In this case, the true positives are 150 and the false negatives are 100, so our sensitivity is 150 / (150 + 100) = 0.6. This means that our model correctly identified 60% of the patients with the disease.

To calculate specificity, we use the formula: specificity = true negatives / (true negatives + false positives). In this case, the true negatives are 100 and the false positives are 50, so our specificity is 100 / (100 + 50) = 0.67. This means that our model correctly identified 67% of the patients without the disease.

Overall, our model has a sensitivity of 0.6 and a specificity of 0.67. This means that it is more effective at correctly identifying patients without the disease than patients with the disease. In a real-world scenario, we would need to consider the trade-offs between sensitivity and specificity in order to determine the most appropriate model for our needs.

This article is part of the series :

Getting Started with Machine Learning: A Step-by-Step Guide

--

--

Pasquale Di Lorenzo

As a physicist and Data engineer ishare insights on AI and personal growth to inspire others to reach their full potential.