Twelve Labs startup announced the release of a new AI model aimed at video analysis

Catherine Chef
Startup Reviews
Published in
2 min readOct 25, 2023

--

Image credit: Twelve Labs

Twelve Labs startup, which develops AI models for video analysis, announced the release of its new multimodal model Pegasus-1. In particular, it can analyze videos and create reports based on the seen content, as well as break down videos into logical sections and perform other tasks.

The startup’s main goal, according to its co-founder and CEO Jae Lee, is to train AI models to solve complex video-to-text problems. He told TechCrunch that the company was founded to create an infrastructure for multimodal video understanding. The startup’s products differ from large language models like ChatGPT in that they are created and trained specifically for working with video, combining the ability to work with visual, audio and speech components.

Twelve Labs’ models aim to decipher and describe in natural language the innards of video, including actions, objects and background sounds. Such tools help developers create programs to perform video searches, transcribe text, auto-summarize, extract relevant information, etc.

A closed beta version of the Pegasus-1 model was released in May of this year. As follows from the description of the tool on the startup’s website, the model has about 80 billion parameters and consists of three mutually learning components: a video encoder, a text-to-video alignment model, and a language decoder. To train Pegasus-1, the startup team used 300 million hand-selected, diverse videos with text transcripts, as well as 1 billion images with natural language descriptions.

At the moment, the model is not yet available for public use, but you can place a request on the waiting list to gain access to the tool.

As Lee explains, Twelve Labs’ technology can also be used in areas such as advertising and video content moderation – for example, if a video shows knives, the model can determine whether the video is educational material (cooking program, etc.) or contains scenes of cruelty (videos of crimes, etc.). In addition, the tool can be used for media analytics and tasks such as auto-creating video titles.

Twelve Labs isn’t the only startup developing such tools. Similar multimodal models are being developed by Google, Microsoft and Amazon, as well as many smaller companies and startups. However, the company says Twelve Labs’ models differ from competing tools in both quality and a broader set of features that allow customers to perform more detailed video analysis using their own data. According to information from the TechCrunch portal, more than 17 thousand corporate clients from various industries now cooperate with the startup.

Along with the announcement of the release of Pegasus-1, the startup also announced the closing of a funding round, during which it raised $10 million. Investors included Nvidia, Intel and Samsung Next.

--

--