AI on Edge Devices: The New Wave in Machine Learning-Part-2

AiOTA LABS
5 min readApr 30, 2018

--

Image credit: ARM

In part-1 here, I talked about pros and cons of developing custom made machine learning ASIC viz. trillium, Nervana ,other start-ups venture and we further discussed developing a highly optimized neural network framework from scratch like SequeezeNet.

In part-2, I will discuss the rest two available compression technique and Aiotalabs solution.

Available Compression technique: In recent time there was a flurry of compression technique. I will only cover the main points of these techniques and will give you references to various literature if you want to learn more in details of these technology. Compression technique first major beauty is that it works on already existing trained framework and you need not to create everything from scratch. There are various approaches to compress the pretrained weights. Zhang et al. used compression by matrix factorization and reported 3.8X whole model acceleration with increase of top-5 error by 0.3% however they only exploited redundancy between filter. In my metric of evaluation, I grade this approach as 70%!!

Matrix factorization compression technique by Zhang et al. Top: Baseline.Bottom: Compressed convolution.

Pruning by Han et al. focused on memory reduction by indexing non-zero-valued weights so that accessing the weights value indirectly through index. But zero values need to be restored for inference therefore usage memory can’t be reduced practically. In my metric of evaluation, I grade this approach as 15%!!

Pruning approach by Han et al.

Liu et.al solved this problem of Han et.al approach by utilizing sparse calculation for compression so that zero values are not included in the inference calculation but they assume a specific CPU architecture thereby reducing generality of this approach. In my metric of evaluation, I grade this approach as 15%!!

Other extreme approach of qunatization is binary network by Rastegari et al. where the network weights are quantized either -1 or +1. It achieved a high memory compression with more or less similar accuracy from baseline but had low computation cost compression (~2x). XNOR-Net which quantized both input and weights achieves high compression rate for both memory and computation cost but accuracy dropped by >10%.In my metric of evaluation, I grade this approach as 50%!!

Binary and XNOR-Network

Deep compression Han et al. combined pruning, quantization and encoding and reported 50X memory storage reduction (however people trying to replicate his method never achieved more than 10X compression) but in inference phase the weights need to be restored indirectly through inidces of pruning, quantization and encoding which increases the pre-processing time. After restoring the original network convolution calculation remains therefore, the usage memory and computation will be the same as baseline uncompressed network. In my metric of evaluation, I grade this approach as 50%!!

Deep Compression by Han et al.

Another powerful approach by Chen et.al which exploit changing filters from spatial domain to frequency domain for compressing the network based on importance filters value. They utilized hash to highly compress high frequency components and lowly compress the low frequency components. But it suffers from the same problem of pre-processing similar to qunatization. Also there is a memory overhead issue because kernels in frequency domain need more memory than the kernels in spatial domain. In my metric of evaluation, I grade this approach as 70%

Convolution in frequency domain

So, till now we have seen that no compression technique is giving us the benefits in totality which makes them edge-ready.

Aiotalabs, however has developed the compression technology which in our metric of evaluation stand in top position. It achieved power saving, compute resource saving, reduced memory footprint as well as lower memory interaction yet speeding up the network!!!!

In order to showcase the efficacy of our technology, we ran our technology on the state-of-art ResNet and below is the result.

Summary of Aiotalabs technology applied on ResNet-18

Below is the detail report on how our proprietary technology compressing the various convolution layers of the ResNet-18 network thereby achieving a phenomenal ~8X memory storage compression, 7X computation cost compression and 2X increase in speed while negligible drop in accuracy from baseline!!! Now it’s your time to grade the result on metric of evaluation.

Detail layer-wise comparison of ResNet-18 and Aiotalabs compressed ResNet-18

It is worth highlighting that any trained network can go through at a push-button style compression through Aiotlabs technology. No need to redefine the application from scratch, no need to change the existing framework, no need to invest time to retrain the network. Just push your existing already available well trained DNN into our technology, go for a small fine tune and your big, fat network is edge-ready. What’s more, you can still use the existing available technology on top of our compressed network for further gain. For example you can achieve 28X compression of parameters by applying quantization on top of our compressed technology!!!

Aiotlabs has prepared various demo version of their technology for your own evaluation in tensorflow, caffe, Keras and PyTorch. Please visit our site www.aiotalabs.com for downloading the evaluation demo.

What’s next: In next blog, I will be discussing the Recurrent Neural Network(RNN)/LSTM/GRU and the compressing technology available and discuss how again Aiotalabs stands winner in this league as well.

Stay tune for further update.

--

--