Computer Vision On IOT Edge Devices

Reginald Garnepudi
Centrica Data Science Blog
3 min readNov 23, 2018

What is Computer Vision

Computer vision is an interdisciplinary field that deals with how computers can be made to understand and extract information from digital images or videos. From the perspective of engineering, it seeks to automate tasks that the human visual system can do.

As trivial as it may sound, achieving human level performance in this field is a mammoth task. There has been a tremendous amount of research in the recent years, especially with the growing popularity of Deep Learning techniques.

In this article we show how computer vision tasks can be performed on low powered devices.

Challenges of running on low powered edge devices

In this context, a low powered edge device is an IOT device that has low processing power, low RAM and low storage. A specific example is a Raspberry Pi 3B+ model. It has a 1.4GHz processor with 1GB of RAM.

Typical deep learning models used for computer vision are memory, CPU and GPU intensive. Below is a comparison of running an image classification task on a Raspberry Pi 3B+ and a GPU

Image Classification Task

Time taken to run an Image Classification Model on different devices

As you can see clearly, the performance on a Raspberry Pi is excruciatingly poor and practically unfit for the purpose.

What are Vision Processing Units

A vision processing unit (VPU) is an emerging class of microprocessor and is a specific type of AI accelerator designed to accelerate machine vision tasks.

They are designed to consume very little power and parallelise computations, which is essential to run deep learning models.

A few examples of VPUs are Movidius Myriad, Eyeriss, Jevois Smart Vision Camera.

Movidius Neural Compute Stick

After a quick feasibility study comparing the available VPUs, we narrowed down to trying the Movidius Neural Compute Stick which houses a Movidius Myriad VPU in an easy to use USB stick form factor. The Neural Compute Stick is designed to take on-device deep learning workloads with ultra low power consumption.

Using the sdk provided, we could easily convert trained Caffe and Tensorflow models into ncsdk graphs which run on the Neural Compute Stick, greatly improving the computational speed in comparison to running the models natively on the Raspberry Pi.

How its done

Training the model

There are many deep learning models for computer vision out there. ResNet, AlexNet, GoogleNet, MobileNet, VGG, etc.

Our specific use case was one of classifying images from a camera feed to identify potential safety hazards in real-time. This meant that the model, running on the Raspberry Pi needed to be fast (> 15fps) and be accurate.

We chose the Caffe version of MobileNet v1 architecture because of its runtime speed. After modifying the architecture to suit the use case, we used the provided pre-trained model weights and fine-tuned it with our own training dataset.

Converting the model to ncsdk graph

Once the model is trained, the next step is to compile it to ncsdk graph (a binary file that runs on the Neural Compute Stick). This is done using the tools provided in the ncsdk.

Using the compiled graph

Using the mvnc python api (part of the ncsdk), we capture the images from the camera (using OpenCV), load the compiled graph into the Neural Compute Stick and perform inferences from the images.

An end to end example of how to deploy the graph is described in this blog post.

Results

We will let the numbers speak for themselves.

Image Classification Task

Vision Processing Units are very useful in realising IOT use cases using low powered devices like Raspberry Pi by enabling them to perform computationally intensive tasks without consuming much power.

--

--