OpenAI’s unCLIP Text-to-Image System Leverages Contrastive and Diffusion Models to Achieve SOTA Performance

Contrastive vision-language models such as OpenAI’s CLIP (Contrastive Language–Image Pre-training, 2021) have garnered much attention in the computer vision research community thanks to their impressive capabilities in zero-shot learning and learning robust representations of images that capture both…