Describing Videos with Neural Networks
Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. Recent advances are starting to enable machines to describe image with sentences. This experiment uses neural networks to automatically describe the content of videos.
This line of work has been the subject of multiple academic papers from the research community over the last year. Some of the proposed approaches have been implemented and are available as open-source:
NeuralTalk is overall very fascinating. With the right selection of inputs, it works with astounding accuracy and generates informative sentences. When it fails... Inputs & Outputs are cherrypicked, balancing accuracy VS comedy.
NeuralTalk´s model generates natural language descriptions of images. It leverages large datasets of images and their sentence descriptions to learn about the correspondences between language and visual data.
The model is based on a combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural Networks over sentences, and a structured objective that aligns the two modalities. For more insights, read this great blog post: Image captioning for mortals.
The NeuralTalkAnimator is a python helper, that creates captioned videos. It take a folder with videos and returns a folder with processed videos back. It´s open source on GitHub. Thanks to @karpathy for releasing NeuralTalk! Send input video requests to @samim (<3min, Youtube 720p).
The rate of innovation in the field of machine captioning images is astounding. While results might still be inaccurate at times, they are certainly entertaining. The next generation of networks, trained on even bigger datasets, will undoubtedly operate faster and more precise.
Emerging novel approaches like Describing Videos by Exploiting Temporal Structure, Action-Conditional Video Prediction using Deep Networks in Atari Games and Searchable Video are highly fascinating. Exiting Times!
Keep up with developments at gitXiv.com.