OpenAI introduces Sora: A text to video AI model

OpenAI’s Sora is an AI model which gives you ability to create realistic and imaginative scenes from text instructions.

Zaid Pathan
LushBinary
2 min readFeb 15, 2024

--

Screenshot: OpenAI Demo Video

Key Features

  • Generate video from text prompt
  • Generate video from image (With accuracy and attention to small details)

On Feb 15, OpenAI launched text-to-video diffusion model — Sora. Its can generate high quality minute long videos adhering to user’s prompt.

Sora is under assessment by red teamers for harms or risks. OpenAI is also granting access to different designers, filmmakers and visual artists to get feedback on helpfulness.

It can generate complex scenes with numerous characters, specific types of motions and neat details of the subject and background. It considers the physical world as well before outputting the result.

The model understands the language and accurately interpret and generation vibrant emotions. Sora greatly create multiple shots whithing single video that also persist characters and visual style.

Current model is bit weak in terms of simulating physics, however should improve over time.

OpenAI is taking various steps for it’s Safety before releasing it to public, working to remove misinformation, hateful content and bias.

OpenAI is building tools to help detect misleading content such as detection classifier to identify if video was generated by Sora. It will include C2PA metadata in future deployments. They arealso leveraging existing safety methods used in DALL·E 3.

Behind the scene Sora generates a video buy starting off with static noise and gradually transforms it by removing noise over multiple steps. Most importantly, OpenAI is giving the model foresight of multiple frames at a time, they’ve solved a challenging problem of making sure a subject always stays the same even if it goes out of the view temporarily, amazing.

Like GPT models, Sora uses a transformer architecture, which provides superior scaling performance. OpenAPI represent images and videos as collections of smaller unitis of data called patches, each of them is akin to a token in GPT.

Sora is built on past research in DALL·E and GPT models.

The Sora API is coming soon, and our experts at lushbinary.com are very excited to use their API and build something great with that.

Book your 15 days free software development trial now!

--

--