An introduction to the MXNet API — part 6

In part 5, we used three different pre-trained models for object detection and compared them using a couple of images.

One of the things we learned is that models have very different memory requirements, the most frugal model being Inception v3 with “only” 43MB. Obviously, this begs the question: “can we run this on something really small, say a Raspberry Pi?”. Well, let’s find out!

Building MXNet on a Pi

There’s an official tutorial, but I found it to be missing some steps, so here’s my version. It works fine on a Raspberry Pi 3 running the latest Raspbian.

$ uname -a
Linux raspberrypi 4.4.50-v7+ #970 SMP Mon Feb 20 19:18:29 GMT 2017 armv7l GNU/Linux

First, let’s add all necessary dependencies.

$ sudo apt-get update
$ sudo apt-get -y install git cmake build-essential g++-4.8 c++-4.8 liblapack* libblas* libopencv* python-opencv libssl-dev screen

Then, let’s clone the MXNet repository and checkout the latest stable release. Don’t miss this last step, as I found HEAD to be broken most of the time (Update 30/04/17: the MXNet dev team got in touch and informed me that Continuous Integration is now in place. I can confirm that HEAD now builds fine. Well done, guys).

$ git clone https://github.com/dmlc/mxnet.git --recursive
$ cd mxnet
# List tags: v0.9.3a is the latest at the time of writing
$ git tag -l
$ git checkout tags/v0.9.3a

MXNet is able to load and save data in S3, so let’s enable this feature, it might come in handy later on. MXNet also supports HDFS but you need to install Hadoop locally, so… no :)

We could just run make but given the limited processing power of the Pi, the build is gonna take a while: you don’t want to it to be interrupted if your SSH session times out! Screen is going to solve this.

To speed things up a little, we can run a parallel make on 2 cores (out of 4). I wouldn’t recommend using more, as my Pi became unresponsive when I tried it.

$ export USE_S3=1
$ screen make -j2

This should last about an hour. The last step is to install the library and its Python bindings.

$ cd python
$ sudo python setup.py install
$ python
Python 2.7.9 (default, Sep 17 2016, 20:26:04)
[GCC 4.9.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import mxnet as mx
>>> mx.__version__
'0.9.3a'

Loading models

Once we’ve copied the model files to the Pi, we need to make sure that we can actually load them. Let’s reuse the exact same code we wrote in part 5. For the record, the Pi is in CLI mode with about 580MB of free memory. All data is stored on a 32GB SD card.

Let’s try to load VGG16.

>>> vgg16,categories = init("vgg16")
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

Ouch! VGG16 is too large to fit in memory. Let’s try ResNet-152.

>>> resnet152,categories = init("resnet-152")
Loaded in 11056.10 milliseconds
>> print predict("kreator.jpg",resnet152,categories,5)
Predicted in 7.98 milliseconds
[(0.87835813, 'n04296562 stage'), (0.045634001, 'n03759954 microphone, mike'), (0.035906471, 'n03272010 electric guitar'), (0.021166906, 'n04286575 spotlight, spot'), (0.0054096784, 'n02676566 acoustic guitar')]

ResNet-152 loads successfully in about 10 seconds and predicts in less than 10 milliseconds. Let’s move on to Inception v3.

>>> inceptionv3,categories = init("Inception-BN")
Loaded in 2137.62 milliseconds
>> print predict("kreator.jpg",resnet152,categories,5)
Predicted in 2.35 milliseconds
[(0.4685601, 'n04296562 stage'), (0.40474886, 'n03272010 electric guitar'), (0.073685646, 'n04456115 torch'), (0.011639798, 'n03250847 drumstick'), (0.011014056, 'n02676566 acoustic guitar')]

On a constrained device like the Pi, model differences are much more obvious! Inception v3 loads much faster iand predicts in a few milliseconds. Even when the model is loaded, there’s plenty of RAM left on the PI to run an actual application, so it’s definitely an interesting candidate for embedded apps. Let’s keep going :)

Capturing images using the Pi camera

One of the best gadgets you can add to the Raspberry Pi is a camera module. It couldn’t be simpler to use!

>>> inceptionv3,categories = init("Inception-BN")
>>> import picamera
>>> camera = picamera.PiCamera()
>>> filename = '/home/pi/cap.jpg'
>>> print predict(filename, inceptionv3, categories, 5)

Here’s an example.

Predicted in 12.90 milliseconds
[(0.95071173, 'n04074963 remote control, remote'), (0.013508897, 'n04372370 switch, electric switch, electrical switch'), (0.013224524, 'n03602883 joystick'), (0.00399205, 'n04009552 projector'), (0.0036674738, 'n03777754 modem')]

Really cool!

Adding a couple of Amazon AI services, because why not?

Of course, I cannot resist running the same picture through Amazon Rekognition using the Python scripts I wrote a while ago (article, code).

$ ./rekognitionDetect.py jsimon-public cap.jpg copy
Label Remote Control, confidence: 94.7508468628

Good job, Rekognition. Now… wouldn’t it be nice it the Pi actually told us what the picture is about? It’s not too complicated to add Amazon Polly to the mix (article).

Amazon Rekognition and Amazon Polly are managed services based on Deep Learning technology. We don’t have to worry about models or infrastructure: all we have to do is to invoke an API.

So, here’s a video of my Raspberry Pi performing real-time object detection with the Inception v3 model running in MXNet, and describing what it sees with Amazon Polly.


Well, we’ve come a long way! In these 6 articles, we learned how to:

  • manage data with NDArrays,
  • define models with Symbols,
  • run predictions with Modules,
  • load and compare pre-trained models for object detection,
  • use a pre-trained model on a Raspberry Pi in real-time.

We focused on Convolutional Neural Networks for object detection, but there is much more to MXNet, so expect more articles!

This concludes this series. I hope you enjoyed it and learned useful stuff along the way.