Member-only story
What Is CLIP and Why Is It Becoming Viral?
When a neural network uses so much data it becomes “universal”
Pre-defined classes and categories: this is the limitation where new classes can only be classified by machine learning and neural networks after retraining. For a period of time, this retraining and fine-tuning procedure have almost become “standard” — it is such a common practice that people forget that it is a still problem yet to be solved……at least before CLIP was introduced.
So, what exactly is CLIP?
CLIP (Contrastive Language-Image Pre-training) is a training procedure unlike common practices in the vision community. For a period of time, the capabilities of model/training methods are benchmarked on the ImageNet dataset that spans 1000 classes. We train on a subset of ImageNet, and test it on a different subset to measure how well a model generalises. While straightforward, this convention overlooks the exponentially scaling image collections on the internet and the potential benefits it could bring; CLIP, indeed, shows that it is a LOT we are missing out on.

