Let’s summarize what OpenAI has shared to see how this model actually works.

How Sora (actually) works

There’s a lot of disinformation about the most important video model out there. We don’t have to speculate.

Mike Young
7 min readFeb 17, 2024

--

This week, the team at OpenAI introduce Sora, a large-scale video generation model that displays new capabilities for simulating basic aspects of our physical world. I’ve been following text-to-video generation for a long time, and I think this model represents a step function increase in terms of quality.

I’ve also seen a lot of speculation on Reddit and Twitter about how this model works, including some off the wall suggestions (does Sora run inside a game engine called Unreal?). When something this groundbreaking gets released, a lot of people want to appear like they know what’s going on, or might even trick themselves into thinking they do know based on subtle clues and artifacts across a super small sample of released videos. The worst example I found of this was Dr. Jim Fan’s post claiming that “Sora is a data-driven physics engine,” which has been viewed about 4M times on Twitter (It’s not a data driven physics engine at all).

Fortunately, OpenAI released a research post explaining their model’s architecture, so there’s no actual need to speculate if we read what they wrote. In this post, I’ve done that for you, and I’m going to walk you through what the OpenAI…

--

--

Mike Young

Writing in-depth beginner tutorials on AI, software development, and startups. Follow me on Twitter @mikeyoung44 !