An Emphasis on the Minimization of False Negatives/False Positives in Binary Classification

Sanskriti Singh
5 min readJul 27, 2021

The minimization of specific cases in binary classification, such as false negatives or false positives, grows increasingly important as we implement more machine learning into current products. One such field is healthcare, where the minimization of False Negatives is far greater than the minimization of False Positives when machine learning is used to diagnose patients. Similarly in the crime department, false positives are more dangerous than false negatives since an innocent man could be charged with a crime they didn’t do.

A basic binary confusion matrix

Current methods that are available to minimize cases like false negatives include

  1. Altering Class Weights
  2. Data Augmentation
  3. Threshold Line
  4. Altering a portion of the data’s real value and retrain the model

1. Altering Class Weights

As per Keras documentation, class weight change is

“Optional dictionary mapping class indices (integers) to a weight (float) value, used for weighting the loss function (during training only)”.

This parameter dictionary changes the way the loss is affected by the classes. More specifically meaning that if we put class_weights = {0:1, 1:50}, then the loss will be affected 50 times as much if the case is a real value compared to it being a false case. This could be applied to more than 2 classes (binary).

The loss function in binary classification is:

When we add the class_weights, the loss function looks like this:

W0 stands for the class weight of the “0” or False class.
W1 stands for the class weight of the “1” or True class.

By changing the loss function as such, we force the model to learn more according to the class with the larger weight. There will be a larger drop in loss when the class with the larger weight is mistaken versus the other class.

2. Data Augmentation

Data Augmentation is the most common method used for models where the data is imbalanced, but it can also be used to make the data imbalanced for a bias towards a certain case. With imbalanced data, the model will be better at predicting the cases there were more of in the data. While this may not seem like the most optimal solution to putting an emphasis on the minimization of a certain case, it is still a possibility.

Imbalanced Data with negative cases greater than positive.

An example of this is in the crime field, where we would rather a model predict that a man guilty is predicted as innocent, versus an innocent man being sentenced as guilty. If our model is trained on data where there are more innocent cases or false cases, then its predictions will be slightly skewed toward predicting innocent over guilty.

When using this method to minimize a certain case, it is important not to totally imbalance the data because that would ruin the overall performance of the model.

3. Threshold Line

One of the easiest methods to minimize the outcomes of a certain case is simply changing the decision boundary line from the basic 0.5 to above (when reducing False Positives) or below (when reducing False Negatives).

Normally when you use your model to predict new images it produces a value otherwise known as y_hat. This value ranges from 0 to 1. In most cases, when the value is above 0.5 then the final prediction is considered 1 or True. When the value is below 0.5 then the final prediction is 0 or False. By changing the threshold from 0.5, we reduce the possibilities of the model predicting a certain outcome.

For example, if we wanted to reduce the number of False Negatives, we would try to make it so the model predicts fewer Negatives (0) and more Positives (1). Therefore we would decrease the threshold line so that the model would predict more positive cases (1). To be ever more specific, if the new threshold line is 0.3 then everything above 0.3 would be positive, either True Positives or False Positives. Everything below 0.3 would be predicted negative, True Negatives and False Negatives. Since our line has decreased the number of possibly False Negatives has also decreased.

It should be noted that by doing this, the possibility of False Positives increases. In other words, by decreasing the False Negatives we are increasing the False Positives.

4. Altering a portion of the data’s real value and retrain the model

To minimize the number of False Negatives (FN) or False Positives (FP) we can also retrain a model on the same data with slightly different output values more specific to its previous results. This method involves taking a model and training it on a dataset until it optimally reaches a global minimum. If we were trying to minimize False Negatives, we would take some amount (independent variable) of the False Positives on the training data and change their real value to 1. The figure below draws this procedure out.

Taking a portion of the False Positives and putting its real value as 1, and then retraining the model.

This works because some of the False Positives predicted have similar features to some of the False Negatives, hence when changing the model to train on the False Positives with the notion that these are the “true” features, it learns to identify with a bias towards images with pneumonia. In the figure below we can see that many of the incorrect cases lie close to each other, meaning the model sees similar features between the two. If we train the model to see these features as true or having pneumonia it will put an emphasis on the minimization of False Negatives. Vice Versa would be performed if we wanted to reduce False Positives.

A 2-dimensional graph showcasing a possible prediction and decision boundary line.

These are some ways we can create binary classification models with an emphasis on the minimization of a certain case.

--

--