The Startup
Published in

The Startup

[Paper] Deep Video: Large-scale Video Classification With Convolutional Neural Network (Video Classification)

Outperforms Hand-Crafted Feature Approaches

Sports-1M Dataset

Outline

1. Fusion Strategies

Fusion Strategies (Red, green and blue boxes indicate convolutional, normalization and pooling layers respectively.)

1.1. Single-frame

1.2. Early Fusion

1.3. Late Fusion

1.4. Slow Fusion

2. Input resolution strategies

Input resolution strategies

2.1. Fovea Stream

2.2. Context Stream

3. Experimental Results

3.1. Sports-1M

Results on the 200,000 videos of the Sports-1M test set.
Filters learned on first layer of a multiresolution network. Left: context stream, Right: fovea stream.
Some Examples of Predictions on Sports-1M test data

3.2. UCF-101

Results on UCF-101 for various Transfer Learning approaches using the Slow Fusion network
Mean Average Precision of the Slow Fusion network on UCF-101 classes broken down by category groups

--

--

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +768K followers.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store