I spy with my virtual eye — detection of objects in videos using Videoflow and GPUs

Ramnath Nayak
cloudnativeinfra
Published in
4 min readJul 28, 2019
Attendees at a rock concert filming on their mobile phones
Photo by Noiseporn on Unsplash

Content Proliferation

We now live in the Information Age, where the rate of creation of content is growing exponentially.

In the context of video production, the fuse of explosive growth has been lit from both the consumer as well as the commercial ends of the industry.

On the consumer end, large scale adoption of mobiles with the ability to capture 4K UHD video has contributed to the rapid growth. On the professional media production end, the bar for content has always been high and live telecasts of events in 4K is pushing the envelope even farther.

There is a third front of prosumer market emerging where drones are being deployed to survey sites and the video relayed back needs to be processed for gaining insight into the activities captured by the drone.

High quality and high volume data is now produced at a much more rapid rate than we can consume using conventional data consumption paradigms. This has called for a radically new approach in analysing video content to automatically identify, analyse and tag the contents of videos. Machine Learning is rising to the occasion to address these problems.

For the consumer market, ML may not have really gone beyond auto enhance, but for the prosumer and commercial media industries, ML can save tremendous amount of effort that used to go into generating metadata for the content, saving significant amounts of time while drastically reducing costs.

Typical use cases include generating metadata to be stored for later search and retrieval, generate ratings, detect potentially offensive content, supporting commentary teams by assisting with player/celebrity recognition etc.

Outside of the media industry, law enforcement agencies too have started deploying classification on video stream to detect and prevent crimes, identify criminals and prosecute them.

ML to the rescue

TensorFlow is a one of a kind tool that is extremely powerful for developing Machine Learning applications. Though it is a generic platform that can be used for wide variety of use-cases, the low level nature of TensorFlow can make it a bit daunting to use.

Enter Videoflow, a video processing framework built on top of Tensorflow. Videoflow provides a higher level abstraction for you to work with, making it much easier to build a video strem processing pipeline (link to Videoflow at the bottom of this page).

For example, this block of Python code using rhe Videoflow framework ‘parses’ an mp4 file called input.mp4, runs Tensorflow classification against it and generates an output avi file called output.avi with the prediction drawn into the output. This is a modified clone of the sample from Videoflow site, let us save this as vidflow.py

import videoflow
import videoflow.core.flow as flow
from videoflow.core.constants import BATCH
from videoflow.consumers import VideofileWriter
from videoflow.producers import VideofileReader
from videoflow.processors.vision.detectors import TensorflowObjectDetector
from videoflow.processors.vision.annotators import BoundingBoxAnnotator
from videoflow.utils.downloader import get_file
class FrameIndexSplitter(videoflow.core.node.ProcessorNode):
def __init__(self):
super(FrameIndexSplitter, self).__init__()

def process(self, data):
index, frame = data
return frame
input_file = "input.mp4"
output_file = "output.avi"
reader = VideofileReader(input_file)
frame = FrameIndexSplitter()(reader)
detector = TensorflowObjectDetector()(frame)
annotator = BoundingBoxAnnotator()(frame, detector)
writer = VideofileWriter(output_file, fps = 30)(annotator)
fl = flow.Flow([reader], [writer], flow_type = BATCH)
fl.run()
fl.join()
Sample output generated by the videoflow script

Now that we have an app that can use GPUs, we can containerise this using Nvidia Docker to make it easy to build and deploy. If you want to pull and run a prebuilt GPU container, just pull lhr.ocir.io/intrnayak/videoflow-gpu and run it using this command:

nvidia-docker run --rm -u $(id -u):$(id -g) -v $(pwd):/usr/src/app videoflow-gpu python vidflow.py

Or if you want to build your own containerised version, here is the Dockerfile I used:

FROM tensorflow/tensorflow:latest-gpu-py3RUN echo "deb http://security.ubuntu.com/ubuntu xenial-security main" \ 
| tee -a /etc/apt/sources.list
RUN echo "deb http://ppa.launchpad.net/jonathonf/ffmpeg-3/ubuntu xenial main " \
| tee -a /etc/apt/sources.list \
&& apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 4AB0F789CBA31744CC7DA76A8CF63AD3F06FC659
ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update && apt-get install -y \
pkg-config \
python-dev \
python-opencv \
libopencv-dev \
libav-tools \
libjpeg-dev \
libpng-dev \
libtiff-dev \
libjasper-dev \
python-numpy \
python-pycurl \
python-opencv
COPY . /videoflow
RUN pip install /videoflow --find-links /videoflow
RUN mkdir -p /usr/src/app
WORKDIR /usr/src/app
CMD ["python", "/videoflow/examples/object_detector.py"]

CPU vs GPU

To see the real difference in performance between the CPU and GPU for workloads like this, you can build a non-GPU enabled container and run it and see the performance difference for yourself.

To build a CPU version that does not use GPU, just replace the latest-gpu-py3 tag in the Dockerfile with latest-py3 and use docker instead of nvidia-docker.

In tests I ran on the OCI VM.GPU3.1 shape, a short clip that takes a mere 43 seconds using the GPU instance takes 1 minute 30 seconds without GPUs.

That is a significant difference in performance, imagine the order of magnitude difference when you run large volume workloads at scale!

References

--

--

Ramnath Nayak
cloudnativeinfra

Outbound Product Manager at Oracle Cloud Infrastructure