How to Use Label Smoothing for Regularization

What is label smoothing and how to implement it in PyTorch

Dimitris Poulopoulos
Mar 20 · 4 min read
Photo by Dave on Unsplash

Overfitting and probability calibration are two issues that arise when training deep learning models. There are a lot of regularization techniques in deep learning to address overfitting; weight decay, early stopping, dropout are some of the most popular ones. On the other hand, Platt’s scaling and isotonic regression are used for model calibration. But is there one method that fights both overfitting and over-confidence?

Label smoothing is a regularization technique that perturbates the target variable, to make the model less certain of its predictions. It is viewed as a regularization technique because it restrains the largest logits fed into the softmax function from becoming much bigger than the rest. Moreover, the resulted model is better calibrated as a side-effect.

In this story, we define label smoothing, implement a cross-entropy loss function that uses this technique and put it to the test. If you want to read more about model calibration please refer to the story below.

Label Smoothing

Imagine that we have a multiclass classification problem. In such problems, the target variable is usually a one-hot vector, where we have 1 in the position of the correct class and 0s everywhere else.

Label smoothing changes the target vector by a small amount ε. Thus, instead of asking our model to predict 1 for the right class, we ask it to predict 1-ε for the correct class and ε for all the others. So, the cross-entropy loss function with label smoothing is transformed into the formula below.

In this formula, ce(x) denotes the standard cross-entropy loss of x (e.g. -log(p(x))), ε is a small positive number, i is the correct class and N is the number of classes.

Intuitively, label smoothing restraints the logit value for the correct class to be closer to the logit values for other classes. In such way, it is used as a regularization technique and a method to fight model over-confidence.

PyTorch Implementation

The implementation of a label smoothing cross-entropy loss function in PyTorch is pretty straightforward. First, let us use a helper function that computes a linear combination between two values:

Next, we implement a new loss function as a PyTorch nn.Module.

We can now drop this class as is in our code. For this example, we use the standard pets example.

We transform the data into a format ready to be used by the model, choose a ResNet architecture and aim to optimize the label smoothing cross-entropy loss. After four epochs the results are summarized below.

We get an error rate of 7.5%, which is more than acceptable for ten or so lines of code, where, for the most part, we use the default settings.

There are many things that we could tweak to make our model perform better. different optimizers, hyper-parameters, model architectures, etc. For instance, you can read how to take the ResNet architecture a bit further in the story below.


In this story, we saw what label smoothing is, when to use it and how to implement it in PyTorch. We then trained a state-of-the-art computer vision model to recognize different breeds of cats and dogs in ten lines of code.

Model regularization and calibration are two important concepts. Having a better understanding of the tools that combat variance and over-confidence will make you a better deep learning practitioner.

My name is Dimitris Poulopoulos and I’m a machine learning researcher at BigDataStack and PhD(c) at the University of Piraeus, Greece. I have worked on designing and implementing AI and software solutions for major clients such as the European Commission, Eurostat, IMF, the European Central Bank, OECD, and IKEA. If you are interested in reading more posts about Machine Learning, Deep Learning and Data Science, follow me on Medium, LinkedIn or @james2pl on twitter.

Towards AI

Towards AI, is the world’s fastest-growing AI community for…

Dimitris Poulopoulos

Written by

Machine learning researcher @ BigDataStack | PhD(c) @ University of Piraeus, Greece

Towards AI

Towards AI, is the world’s fastest-growing AI community for learning, programming, building and implementing AI.

More From Medium

More from Towards AI

More from Towards AI

More from Towards AI

Image Filtering

More from Towards AI

Mar 29 · 8 min read


Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade