Previously, we introduced a bag of tricks to improve image classification performance with convolutional networks in Keras, this time, we will take a closer look at the last trick called mixup.
What is the mixup training?
The paper mixup: BEYOND EMPIRICAL RISK MINIMIZATION offers an alternative to traditional image augmentation technique like zooming and rotation. By forming a new example through weighted linear interpolation of two existing examples.
(xi; yi) and (xj; yj) are two examples drawn at random from our training data, and λ ∈ [0; 1], in practice, λ is randomly sampled from the beta distribution, i.e. Beta(α; α).
α ∈ [0.1; 0.4] leads to improved performance, smaller α creates less mixup effect, whereas, for large α, mixup leads to underfitting.
As you can see in the following graph, given a small α = 0.2, beta distribution samples more values closer to either 0 and 1, making the mixup result closer to either one of the two examples.
What are the benefits of mixup training?
While the traditional data augmentation like those provided in Keras ImageDataGenerator class consistently leads to improved generalization, the procedure is dataset-dependent, and thus requires the use of expert knowledge.
Besides, data augmentation does not model the relation across examples of different classes.
On the other hand,
- Mixup is a data-agnostic data augmentation routine.
- It makes decision boundaries transit linearly from class to class, providing a smoother estimate of uncertainty.
- It reduces the memorization of corrupt labels,
- It increases the robustness to the adversarial examples and stabilizes the training of generative adversarial networks.
Mixup image data generator in Keras
Attempting to give mixup a spin? Let’s implement an image data generator that reads images from files and works with Keras
model.fit_generator() out of the box.
The core of the mixup generator consists of a pair of iterators sampling images randomly from directory one batch at a time with the mixup performed in the
Then you can create the training and validation generator for fitting the model, notice that we don’t use mixup in the validation generator.
We can visualize a batch of mixup images and labels with the following snippet in a Jupyter notebook.
The following picture illustrates how mixup works.
Conclusion and further thoughts
You might be thinking mixing up more than 2 examples at a time might leads to better training, on the contrary, combinations of three or more examples with weights sampled from the multivariate generalization of the beta distribution does not provide further gain, but increases the computation cost of mixup. Moreover, interpolating only between inputs with equal label did not lead to the performance gains of mixup.
Check out the full source code on my Github.