What is Neural Collapse?

Sharad Joshi
2 min readFeb 25, 2022

Terminal Phase of Training(TPT) — Training beyond 0 training error i.e training error is at 0 while we’re pushing training loss further down. Aim is to reduce the loss as much as possible even if misclassification rate is already 0. Why would someone do that?
One would expect such a model to be highly overfitted to the training data, and noisy but recently it’s shown empirically that the reverse is actually true.
Deepnets trained in such a way provide good generalisation and adversarial robustness. This happens via a phenomenon called Neural Collapse.

Neural Collapse : Consider the last and second to last layer (let’s call it feature or activation layer) of a deep neural net being trained on some classification task. If we sample the first(mean) and second(variance) moments of the feature layer at several epochs, we can see that within-class covariance by between class covariance of features goes to 0 as we keep training the model, especially during TPT i.e features within a class concentrate to the mean of all the features in that class as shown in 3 class classification diagram below.

This means, the class features form well separated clusters (Simple equiangular tight frame, to be precise) in feature space(even linearly separable). Also, this makes the last layer classifier to behave as a nearest neighbour classifier…

--

--