Facebook & CMU’s Zero-Shot VideoCLIP Outperforms Fully-Supervised SOTA Methods for Video-Text Understanding

Pretrained large language models have revolutionized the natural language processing (NLP) research field, achieving state-of-the-art performance and enabling widespread and effective deployment in many real-world applications. One of the main drawbacks to such models however is that they require…