Machine learning and video recognition: an important border

Published in

Enrique Dans

3 min readMar 9, 2017

Google has announced during its Google’s Next Cloud conference that it has developed the ability to search and locate objects in a video, and is now offering this service in the form of a Cloud Video Intelligence API available to developers, as it has done with the rest of its machine learning APIs. Besides the acquisition of the platform for predictive modeling and analytics competitions Kaggle, also by Google, this is definitely the most important event of the week in a topic that lately makes the headlines pretty much every day.

The announcement, made by Stanford’s professor and director of machine learning and AI of Google Cloud, Fei-Fei Li, may seem relatively trivial considering that for some time now we have become accustomed to searching for keywords in a collection of photos and obtaining results that do not come directly from manual tagging, but instead from the recognition of existing objects in the images. Nevertheless, constructing the same function for video is much more complex, and the possibilities it opens for the company that owns the largest video repository in the world are huge.

What happens when an algorithm can visualize a video, understand it and recognize the objects in it? Until now, a video was a closed container within which the only ways we had to locate something was by its title or tagging it with keywords. For many years, this situation has seemed normal, given the limitations of technology.

The number of videos online is growing, but for the moment, the ability to index them is limited. What happens when machine learning algorithms are able, on the one hand, to recognize the words spoken in a video, to process them as text and to allow us to search them, and in addition, other algorithms can help us search for images? We will now be able to ask a search engine to locate mentions or appearances of something — an object, a brand, a logo, a person, etc. — in a video repository, and receive a results page listing the videos that meet all those criteria, and more … How many new possibilities and avenues for innovation can be opened thanks to something like that?

For Google, as for other players in the cloud computing environment, competing is not simply about offering more benefits for less — which would mean not just specializing in a classic cost leadership strategy — but being able to offer more sophisticated features for users. The offer of a feature like this automatically makes Google’s cloud, seen until now as lagging behind Amazon or Microsoft’s, a more interesting option for players for whom video plays a fundamental role, that can now think about offering more services associated with the enhancement of their repositories. At the same time, it also allows for the development of new services from other competitors, which could see the adoption of platform models for those who see video as the center of their business. This is going to make it possible to index millions of hours of video with contents of all kinds, and open to the possibility of being treated in much more interesting ways.

Now, a machine can watch a video of a tiger, understand what a tiger is in the video, and find videos in which tigers appear. All this, without anyone having labeled or titled that video as containing a tiger. The best thing about studying machine learning and artificial intelligence is that it never ceases to amaze.

(En español, aquí)

Machine learning and video recognition: an important border

Written by Enrique Dans