FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence: A Brief Summary

4 min readMay 31, 2024

With increasing computational power and model sizes, we can easily overfit. Thus, we must use larger data sets. However, collecting labeled data is a labor-intensive task. We can easily collect unlabeled data from the internet or open databases.

We use semi-supervised learning to use the unlabeled data to pre-train our model maximally. This is especially useful in medical image classification where we need human experts to label the data sets reliably.

As the name suggests, semi-supervised models use supervised and unsupervised learning components.

FixMatch combines consistency regularization and pseudo-labeling, which are standard methods used in semi-supervised learning.

Background

First, we define the data set.

Now we can separately discuss the fundamental components of FixMatch, consistency regularization, and pseudo labeling.

Consistency Regularization

We assume that the model must be invariant/consistent with the augmentations of the same image. This implies that the decision boundary of the model must be smooth. We utilize unlabeled data points for this purpose. We can define the loss function in the following way.

We use soft pertubations (augmentations). The terms inside the L2 error are different since we derive the augmentations from a distribution α.

Pseudo-labeling

We use the model to drive artificial labels. We retain the artificial label only if the largest probability exceeds a predefined threshold.

FixMatch Algorithm

We must calculate two cross-entropy losses for the supervised part and the unsupervised part.

In the supervised component, we perform standard supervised classification using the model. The loss is the standard cross-entropy loss. We perform soft augmentations over the labeled portion of our dataset.

The unsupervised portion of the model is interesting. We provide the model with two strongly augmented and weakly augmented images. The weakly augmented image is used to derive a one-hot vector (a pseudo label). The model must try to predict this label when the strongly augmented image is given as the input.

This makes the model ignore the augmentations and learn the essence of the images. Using augmented images makes the model similar to the consistency regularization approach.

The loss is defined as follows.

Now we combine both supervised and unsupervised losses.

λu measures the relative weight between the supervised and unsupervised losses.

For weak augmentation, we use a simple flip-and-shift strategy. RandAugment and CTAugment models are used for strong augmentation.

The authors observed that the standard Adam optimizer hurts the performance and used the stochastic gradient descent as the optimizer.

Unlike other semi-supervised architectures, FixMatch is a simpler and easy-to-implement model.

Results

The standard CIFAR-10 data set contains 60000 labeled images categorized into 10 classes. FixMatch architecture with only 250 labeled images achieved 94.93% accuracy. 4-labels per group achieved 88.61% accuracy.

The authors pushed the limits by providing only one labeled image per class. The model was able to achieve 78% accuracy!

The authors used only these 10 labeled images to achieve 78% accuracy.

References

Sohn, K., Berthelot, D., Li, C., Zhang, Z., Carlini, N., Cubuk, E. D., Kurakin, A., Zhang, H., & Raffel, C. (2020). FixMatch: Simplifying Semi-Supervised Learning with Consistency and Confidence. ArXiv. /abs/2001.07685