What is Sora? Everything we know about OpenAI’s new text-to-video model
Included are examples of Sora creations
OpenAI dropped a bombshell on February 15th, revealing the visually stunning Sora model. This is what it can create…
What is Sora?
Sora is OpenAI’s newest and most ambitious model yet. This new text-to-video model can create videos from text inputs that can be up to a minute long right now. Based on the videos shared by OpenAI, this model will likely revolutionize film production and the industry much like DALL-E changed the world of art creation. While words like revolutionary seem to be commonplace when talking about OpenAI products, this stands apart. Each of the examples that have been shared this week (and there have been a ton) are all showing different incredible aspects of what’s to come. Want to make a short film? Sora can create all the visuals that you’ll need. Interested in learning about a new topic? Sora can bring ideas to life and help you see the content. The possibilities are limitless. Note: If you want to see all the examples, head to the bottom of the article. I’ll share as many as I can.
We’re teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction. -OpenAI blog
Okay, where do I sign up?
Well, you can’t. At least not yet. OpenAI understands that this is a powerful tool, both for positive and potentially negative change. It is for these reasons that they are completing additional testing and development to help ensure Sora is safely deployed. Currently only a few select industry experts and OpenAI’s “red teamers” have access to the model. What’s a red teamer? (I had the same question) The Red Teaming Network is a group of individuals from diverse professional backgrounds and areas of expertise who engage with and test OpenAI’s models. Their goal is to ensure that these products are safe, ethical, and generally ready for public use.
How does it work?
That’s a complex question with an even more complex answer. I’ll try not to get too technical in this section but for those looking for a more comprehensive explanation, check out OpenAI’s research blog post.
Alright here we go, OpenAI has approached video generation similarly to how they created GPT-4. They start by breaking down a video into “patches”. These patches are the video equivalent of tokens. Just like tokens, it’s the creation and “understanding” of patches that allow Sora to create videos. While this is a general explanation, the blog post above covers the topic in depth.
What can it do?
Creating one minute videos based on user prompts is just the beginning. The Sora model will allow users to upload any image (or a DALL-E created image) along with a short prompt and create a video using the reference image. It’s worth taking a second to recognize that OpenAI will have protections in place to stop the upload and creation of inappropriate content. You can learn more about their guidelines on their usage policy page. So if I upload an image of my dog, Sora could create a short video of him exploring ancient Rome. Here’s what this process looks like:
This is what Sora generates:
Being someone without a video production background, Sora will open a new world to me and allow me to effortlessly create videos and content that I otherwise won’t have access to.
I’ll share a few more Sora videos below!