Google’s New AI Text-To-Video Is Revolutionary

3 min readNov 6, 2022

image excerpt from Google’s 2022 AI@event

Google just recently hosted its AI@ event in which the company shares the latest research in the fields of robotics, natural language understanding, accessibility, healthcare, and creativity.

I highly recommend that you watch the whole event available here, but there was one reveal I want to mention that simply blew me away.

I know this technology is already available in some capacity but seeing it demonstrated with such simplicity, solidifies how magical this really is.

The technology I am referring to is the ability to produce high-resolution, long form video, generated entirely from a sequence of prompts.

How Is This Done?

Google has been working on two complimentary text-to-video models, from two different research groups, and has now merged the best of both models.

The first is Imagen Video, similar to how Imagen Image AI works (diffusion technique), is a text-to-video generator that can produce short video clips.

The second is Phenaki, a language model capable of realistic video synthesis when given a sequence of textual prompts.

When you combine the high-quality generative model of Imagen with the token sequence coherence of Phenaki, you get long-form high-resolution…

Google’s New AI Text-To-Video Is Revolutionary

How Is This Done?

Written by Paul DelSignore