Sight, Sound & Motion

Dextro
3 min readSep 22, 2015

--

A powerful system that rapidly understands the totality of your video.

http://ssm.dextro.co/

Despite immense amounts of video content being uploaded every minute, the medium has been stuck in the Dark Ages, much like the web before Google came along. Today, editorial experts and curation teams are tasked with manually sifting through millions of videos to find content worth sharing. Even worse, some of the most innovative online video platforms don’t offer search functionality to users, or force users to rely on inadequate text descriptors, like hashtags, to discover video content.

#haveyouevertriedtosearchforsomethingusingahashtag? #withmspellings?

By teaching computers how to see, listen, and comprehend what’s happening in a video, content instantly becomes unlocked and used to its full potential. Dextro’s Sight Sound and Motion (SSM) platform uses deep learning systems to analyze and categorize video based on its fundamental sensory components: what’s visually present (sight), what’s heard (sound), and what’s happening across time (motion).

How it works

We’ve built a unified topic model that integrates the three most descriptive components of video analysis to provide a comprehensive understanding of content. Our machine learning systems are the culmination of years of work trudging through millions of cluttered, non-iconic, real-world video, and leverage the collective power of:

  1. Sight

Our video-specific computer vision algorithms consider hundreds of variables to positively identify the visual elements in a video. From shapes and colors to contextual clues, we look at every visually present element onscreen to inform our analysis.

2. Sound

We then turn to the audio. Our NLP algorithms parse through transcripts of what’s been said or heard to identify major audio topics, as well as cues that that help further inform our findings from Sight.

3. Motion

Lastly, our deep learning systems use the way objects move across time to derive deeper meaning from the scene. In addition, we can determine, for instance, if an object is stationary or moving, what direction it’s moving, and how the scene evolves around it.

Why it matters

For the first time ever, SSM allows online videos to be categorized and searched as seamlessly as text-based web pages. Yet unlike relying on keyword matching of metadata — a text-based approach — to search video, SSM develops a comprehensive understanding of what’s actually happening in the video itself. This analysis parses through the junk to establish a subset of the best video available.

Companies will no longer have to waste time sifting through noisy, irrelevant content. In a matter of seconds, they can leverage our system to automatically understand or curate countless hours of prerecorded videos. People who need to stay on top of what’s going on, like editorial teams and content curators that manage large volumes of video files, can now locate, filter, and serve their users videos based on precise search parameters.

For these reasons, Mic.com uses SSM. Over the past year, several of the biggest news stories that Mic.com published originated from individual users who posted and uploaded real-time footage as stories unfolded. These citizen journalists generated a tremendous amount of video content, but the large volume and scattershot tagging made it difficult for Mic.com to efficiently isolate the most newsworthy segments. With help from our development team, Mic.com employed the SSM platform to analyze, discover, and feature the most relevant and newsworthy videos on social media. Today, Mic.com still uses SSM to parse through the deluge of video to publish more great stories.

--

--