AI on Edge Devices: The New Wave in Machine Learning-Part-1

3 min readApr 24, 2018

Edge devices are seeing a recent surge in AI leading to researcher burning their mid-night oil to cater the requirements to address the compute while maintaining a power and memory efficient footprint.

As Jem Davies ,VP of ARM machine learning division notes

“ People ask me which segments will be affected by ML, and I respond that I can’t think of one that won’t be. Moreover, it will be done at the edge wherever possible, and I say this because I have the laws of physics, the laws of economics and many laws of the land on my side. The world doesn’t have the bandwidth to cope with real-time analysis of all the video being shot today, and the power and cost of transmitting that data to be processed in the cloud is simply prohibitive”

However, the present state-of-art Deep Neural Network (DNN) based AI applications requires absurdly high amounts of memory and computation cost leading to high power consumption which forbids running DNN based AI applications on edge devices. It calls for some techniques which can compresses the present DNN architecture framework or to make DNN compact right from scratch

DNN compression has recently seen lots of research owing to the seminal paper of Denil et al. “https://arxiv.org/abs/1306.0543” where the author noted that 95% of Convolution Neural Network (CNN) are redundant!!!

Researcher or AI practitioner normally adopts following path to make an edge friendly DNN based AI applications.

Design an highly optimized hardware dedicated to one particular DNN architecture,
Make a compact network from scratch
Compress the already trained network and
Aiotalabs technology solution

In Part-1 of my 2–part of this blog, I will be mainly summarizing option#1 and option#2 of the available methods and will discuss pros and cons of the above two methods. The metrics and my own grading on which I will evaluate various technology will be

Memory footprint ->15%
Number of accesses to memory->20% (Dominating power consumption factor)
Number of operations in terms of MAC or Flops(Float and/or INT)->15%
Power Saving->15%
Portability->15%
Cost of development->20%

Let’s start with the evaluation of option#1. Recently there is whole lots of start-up who are working to develop super-efficient chip tailored for DNN based applications. Big one like ARM has recently introduced Project Trillium, a suite of machine learning IP that aims to power neural engines as they migrate to the edge. Along with these chips they are also providing custom tailored DNN software which works efficiently on a particular hardware for which it is tuned.

Unfortunately portability is an issue to this solution apart of chip development cost. An application developed on one platform won’t work efficiently on other platform. So time and money invested in developing the DNN AI applications are confined to one particular solution and developer need to re-do everything from scratch if business need asked to change their hardware software framework. All of the metrics point are highly dependent upon the framework chosen so I personally grade 50% option#1 on my metrics of evaluation

Now let’s talk about option#2.

After realizing that DNN framework has many redundancy, starting 2016 onward there was flurry of research papers targeting to redefine the AlexNet/VGG-16 framework. Some pioneering work reported 50X compression while maintaining the same level of accuracy of AlexNet which is a commendable performance and the new framework is called SqueezeNet by Iondala et.al “https://arxiv.org/abs/1602.0736”, however it is a good memory saving technique but other author Tien-Ju Yang et.al “https://arxiv.org/pdf/1611.05128.pdf “reported that there is no operation saving ( in-fact he reported 1.2X more operation and 1.32X more power consumption from AlexNet).

Also SequeezeNet author found difficulty in deploying the ResNet technique of Kaiming et.al ResNet “https://arxiv.org/abs/1512.03385” which teaches how to train deep network efficiently. SequeezeNet reported an increase in parameters by 1.67X after deploying the ResNet deep neural network training framework.

I further note that people find training SequeezeNet very difficult due to too many hyperparameters to play with though Forrest in his paper mentioned in great detail about these parameters yet people reports otherwise.

In all in all, using my metric of evaluation I found that SequeezeNet reduces memory footprint, no problem in portability and no development cost so I grade it 50% on my metric.

In my part-2, I will be discussing remaining option and will grade on my metric of evaluation.

Till than happy reading and do visit www.aiotalabs.com where we are on constant endeavor to make DNN right on edge devices.

AI on Edge Devices: The New Wave in Machine Learning-Part-1

Written by AiOTA LABS