What is Optical Flow and why does it matter in deep learning

Mark Gituma
The Startup
Published in
12 min readJun 20, 2019

--

Dancelogue (https://dancelogue.com/) is an AI first company whose main objective is to understand and classify human movement in dance. To this end, being able to understand video structure is of vital importance.

The main thing that separates videos from images is that videos have a temporal structure in addition to the spatial structure found in images. Videos also do have other modalities such as sound but this ignored for now. As a video is just a collection of images operating in a specific temporal resolution i.e. frames per second. This means that information in a video is encoded not only spatially (i.e. in the objects or people in a video), but also sequentially and according to a specific order e.g. catching a ball vs throwing a ball, dancing salsa vs hugging. This extra bit of information is what makes classifying videos quite interesting and yet challenging at the same time.

Background

There are quite a few deep learning algorithms that are applied to the spatial domain these range from classification, segmentation, scene understanding etc. However, algorithms that perform well in the temporal domain are fewer and less developed than their spatial counterpart. This is because of the complicated nature of adding time to the equation.

--

--