What is Sora? Everything we know about OpenAI’s new text-to-video model

Included are examples of Sora creations

Chandler K
4 min readMar 6, 2024

OpenAI dropped a bombshell on February 15th, revealing the visually stunning Sora model. This is what it can create…

Prompt: Several giant wooly mammoths approach treading through a snowy meadow, their long wooly fur lightly blows in the wind as they walk, snow covered trees and dramatic snow capped mountains in the distance, mid afternoon light with wispy clouds and a sun high in the distance creates a warm glow, the low camera view is stunning capturing the large furry mammal with beautiful photography, depth of field.

What is Sora?

Sora is OpenAI’s newest and most ambitious model yet. This new text-to-video model can create videos from text inputs that can be up to a minute long right now. Based on the videos shared by OpenAI, this model will likely revolutionize film production and the industry much like DALL-E changed the world of art creation. While words like revolutionary seem to be commonplace when talking about OpenAI products, this stands apart. Each of the examples that have been shared this week (and there have been a ton) are all showing different incredible aspects of what’s to come. Want to make a short film? Sora can create all the visuals that you’ll need. Interested in learning about a new topic? Sora can bring ideas to life and help you see the content. The possibilities are limitless. Note: If you want to see all the examples, head to the bottom of the article. I’ll share as many as I can.

Prompt: Aerial view of Santorini during the blue hour, showcasing the stunning architecture of white Cycladic buildings with blue domes. The caldera views are breathtaking, and the lighting creates a beautiful, serene atmosphere.

We’re teaching AI to understand and simulate the physical world in motion, with the goal of training models that help people solve problems that require real-world interaction. -OpenAI blog

Prompt: Photorealistic closeup video of two pirate ships battling each other as they sail inside a cup of coffee.

Okay, where do I sign up?

Well, you can’t. At least not yet. OpenAI understands that this is a powerful tool, both for positive and potentially negative change. It is for these reasons that they are completing additional testing and development to help ensure Sora is safely deployed. Currently only a few select industry experts and OpenAI’s “red teamers” have access to the model. What’s a red teamer? (I had the same question) The Red Teaming Network is a group of individuals from diverse professional backgrounds and areas of expertise who engage with and test OpenAI’s models. Their goal is to ensure that these products are safe, ethical, and generally ready for public use.

Prompt: This close-up shot of a chameleon showcases its striking color changing capabilities. The background is blurred, drawing attention to the animal’s striking appearance.

How does it work?

That’s a complex question with an even more complex answer. I’ll try not to get too technical in this section but for those looking for a more comprehensive explanation, check out OpenAI’s research blog post.

Alright here we go, OpenAI has approached video generation similarly to how they created GPT-4. They start by breaking down a video into “patches”. These patches are the video equivalent of tokens. Just like tokens, it’s the creation and “understanding” of patches that allow Sora to create videos. While this is a general explanation, the blog post above covers the topic in depth.

Prompt: The Glenfinnan Viaduct is a historic railway bridge in Scotland, UK, that crosses over the west highland line between the towns of Mallaig and Fort William. It is a stunning sight as a steam train leaves the bridge, traveling over the arch-covered viaduct. The landscape is dotted with lush greenery and rocky mountains, creating a picturesque backdrop for the train journey. The sky is blue and the sun is shining, making for a beautiful day to explore this majestic spot.

What can it do?

Creating one minute videos based on user prompts is just the beginning. The Sora model will allow users to upload any image (or a DALL-E created image) along with a short prompt and create a video using the reference image. It’s worth taking a second to recognize that OpenAI will have protections in place to stop the upload and creation of inappropriate content. You can learn more about their guidelines on their usage policy page. So if I upload an image of my dog, Sora could create a short video of him exploring ancient Rome. Here’s what this process looks like:

Combine the above image with the following: In an ornate, historical hall, a massive tidal wave peaks and begins to crash. Two surfers, seizing the moment, skillfully navigate the face of the wave.

This is what Sora generates:

Being someone without a video production background, Sora will open a new world to me and allow me to effortlessly create videos and content that I otherwise won’t have access to.

I’ll share a few more Sora videos below!

Prompt: minecraft with the most gorgeous high res 8k texture pack ever
Prompt: The camera follows behind a white vintage SUV with a black roof rack as it speeds up a steep dirt road surrounded by pine trees on a steep mountain slope, dust kicks up from it’s tires, the sunlight shines on the SUV as it speeds along the dirt road, casting a warm glow over the scene. The dirt road curves gently into the distance, with no other cars or vehicles in sight. The trees on either side of the road are redwoods, with patches of greenery scattered throughout. The car is seen from the rear following the curve with ease, making it seem as if it is on a rugged drive through the rugged terrain. The dirt road itself is surrounded by steep hills and mountains, with a clear blue sky above with wispy clouds.
Prompt: A close up view of a glass sphere that has a zen garden within it. There is a small dwarf in the sphere who is raking the zen garden and creating patterns in the sand.
Prompt: A beautiful silhouette animation shows a wolf howling at the moon, feeling lonely, until it finds its pack.

--

--

Chandler K

Harvard, UPenn, prev NASA , writing about AI, game development, and more..