VQ-Diffusion: Microsoft’s New Text-To-Image AI Tool

Published in

CodeX

3 min readAug 18, 2022

Microsoft’s VQ Diffusion text-to-image AI model — Image created by Jim Clyde Monge

We’re on the cusp of a new era of artistic expression, one in which AI tools will allow us to create realistic and artistically impressive images with ease like never before.

With these AI tools at our disposal, the sky’s the limit when it comes to creating stunning visual masterpieces.

Many tech giants have recently announced their own text-to-image tools.

Google’s Parti
Meta’s Make-A-Scene
TikTok’s AI Greenscreen
OpenAI’s Dall-E2

Now, Microsoft is joining the party with its own version of the text-to-image AI tool — VQ Diffusion.

What Is VQ-Diffusion?

From their official GitHub page:

Vector Quantized Diffusion Model is based on a VQ-VAE whose latent space is modeled by a conditional variant of the recently developed Denoising Diffusion Probabilistic Model (DDPM). It produces significantly better text-to-image generation results when compared with Autoregressive models with similar numbers of parameters. Compared with previous GAN-based methods, VQ-Diffusion can handle more complex scenes and improve the synthesized image quality by a large margin.

VQ-Diffusion: Microsoft’s New Text-To-Image AI Tool

What Is VQ-Diffusion?

Discrete Diffusion Framework

Written by Jim Clyde Monge