Apache MXNet  is an Open Source library for Deep Learning. Thanks to a high-level API available in several languages (Python, C++, R, etc.), software developers and researchers can build Deep Learning models to help their applications make smarter decisions.
However, many state of the art models have hefty compute, storage and power consumption requirements which make them impractical — or even impossible — to use on resource-constrained devices. For example, the 500MB+ VGG-16 Convolution Neural Network (CNN) model is too large to fit on a Raspberry Pi 3, which is not a tiny IoT device in itself. Other models may fit, but they usually suffer from pretty slow inference times, a significant problem when fast processing is required.
Is IoT then doomed to helplessly watch the AI revolution go by?
Of course not. Apache MXNet is IoT-friendly in several ways. In addition, several AWS services also make it pretty easy to deploy MXNet models at the Edge. Let’s dive deeper.
One of the key features of Apache MXNet is lazy evaluation: data processing operations are only executed when strictly necessary. This allows many optimization techniques to be applied, such as avoiding unnecessary calculations or reusing memory buffers. Obviously, this behavior greatly contributes to speed and memory efficiency, both key advantages for IoT projects.
In the last few years, significant advances have been made in shrinking Deep Learning models without losing accuracy. Thanks to operations like pruning (removing useless connections), quantization (using smaller weights and activation values) and compression (encoding weights and activation values), researchers have managed to compress large CNNs by a factor of 35 or more . Even complex models now end up in the 5–10MB range, definitely within reach of smaller IoT devices.
Amazingly, some of these bleeding edge techniques are available in Apache MXNet. You can use mixed precision training, which relies on 16-bit floats instead of 32-bit floats to deliver 2x compression with no loss of accuracy . Thanks to a recent research project, it’s also possible to use Binary Neural Networks, where weights are encoded using only +1 and -1 values. This technique yields 20x to 30x compression, with only limited loss of accuracy .
Optimizing math operations to speed up inference
These libraries provide accelerated math processing routines that are critical to the performance of Deep Learning. Both support the Intel architecture, with NNPACK also supporting ARM v7 and v8, both popular choices for embedded applications. In the same vein, MXNet can leverage performance-oriented libraries for image processing and memory allocation, namely libjpeg-turbo, gperftools and jemalloc.
All these need to be configured at build time. You’ll find detailed instructions on the MXNet website.
Deploying MXNet models at the Edge with AWS services
Performance is great, but what about deployment? How can IoT devices living at the edge of the network use Deep Learning capabilities?
MXNet models can be combined with AWS Lambda, a compute service that lets you run code without provisioning or managing servers. Lambda functions can be triggered by many different types of events, such as an IoT message matching a rule defined on the AWS IoT gateway. Thus, embedded applications that may be too constrained or too rigid to support on-device deployment can rely on cloud-based models in a simple and scalable way.
On more powerful IoT devices, AWS Greengrass provides a local Lambda execution environment able to sync back and forth with the Cloud when network connectivity is available: you’ll find a detailed example in this blog post. This service can be used to deploy and update MXNet models embedded in Lambda functions, now allowing IoT devices to run inference locally without any cloud-based support.
As you can see, Deep Learning and IoT are not worlds apart. Quite the contrary, in fact. We can’t wait to see the devices and applications that will be built on top of Apache MXNet and AWS services. Science is definitely catching up with fiction: exciting times!
Thank you for reading.
 Chen et al. «MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems», 2015.
 Micikevicius et al. «Mixed Precision Training», 2017.
 Yang et al. «BMXNet: An Open-Source Binary Neural Network Implementation Based on MXNet», 2017.