How Open AI’s Sora is Changing the Game

MiniMe ai
MiniMe ai
Published in
3 min readApr 9, 2024

When DALL-E was introduced by OpenAI in early 2021, everyone was marvelling at the wonders of text-to-image generators that transformed simple text prompts into visually stunning images.

In the coming months, significant improvements were made along with other platforms, like Midjourney, Stable Diffusion joining the ‘picture’.

The realm of AI has been rapidly evolving, as AI generated images became more and more indistinguishable from real life pictures, the scope for these AI models went onto video generation.

But unlike images, video generation came with even more challenges given the complexity of maintaining and realism in the content. See these videos generated using Stable Diffusion’s text to video model about a year ago.

Quite impressive right?

But you can still identify this as an AI generated video, lacking the nuanced expressions, smooth transitions and physics of the world.

But fast forward to today, and we encounter Sora, a game-changing tool in video generation by OpenAI, which has dramatically shifted the landscape.

Sora is a latent diffusion model. Allows it to transform abstract noise into detailed, high-resolution videos of up to 1920x1080 pixels and one minute in length using an encoder-decoder and transformer.

Here are some videos generated using Sora:

Prompt: “A stylish woman walks down a Tokyo street filled with warm glowing neon and animated city signage. She wears a black leather jacket, a long red dress, and black boots, and carries a black purse. She wears sunglasses and red lipstick. She walks confidently and casually. The street is damp and reflective, creating a mirror effect of the colourful lights. Many pedestrians walk about.” (click here to watch the full video)

The detail in this video is impeccable. The realistic pedestrians. bright neon lights down to the reflections in the sunglasses and the wet road. Halfway through, the video also changes the perspective of the video into a close up- keeping the details consistent without any cuts or edits.

Prompt: “Historical footage of California during the gold rush.” (click here to watch the full video)

This drone shot of a city in California during the 1800s, showing people and horses walking around the streets. Each plank of the house, reflection of the water and shrubbery in the landscape contributes to this near perfect, super realistic video from such a simple prompt.

Sora stands as a testament to how far AI has come, presenting a level of hyper-realism and fluidity that blurs the lines between AI-generated and real footage.

While Sora isn’t released to the public yet, it is becoming available to red teamers to assess critical areas for harms or risks, as beta testers. And to a number of visual artists, designers, and filmmakers to gain feedback on how to advance the model to be most helpful for creative professionals.

But it’s obvious that we haven’t even begun to scratch the surface for the capabilities and applications.

--

--