Efficient AI For Edge Computing — Part 1: Background

Chen-Ping Yu
Dec 14, 2021 · 9 min read

Introduction

Artificial Intelligence (AI) is one of the most exciting technologies today, and AI couldn’t have gotten to where it is without the help of — artificial neural networks with many () layers that perform a type of computation called Convolution, to and perform tasks such as image recognition in computer vision (CV) and language translation in natural language processing (NLP). Given the importance of the basic computation block of convolutions being used in these networks, they are therefore commonly referred to as . The recent successes of Deep Learning/CNNs and its applications have been so overwhelming that it has become synonymous with the word AI itself.

However, AI (CNNs) typically requires high-performance GPUs in order to support its computation needs, as networks usually have millions of parameters and it would need a GPU’s massive multi-threading ability to make it real-world practical; further, to be able to scale AI to even more applications, they would need to be running directly on edge devices which usually do not have high-performance GPUs onboard. Therefore, making AI efficient enough and without suffering significant accuracy trade-off is a hot topic in recent years, and this is what Phiar excels at — our AI has to be highly efficient in order to run completely on a consumer-grade vehicle’s processors, in real-time, all while maintaining a high accuracy level that’s needed for all of the related warning and AR in-cockpit experiences. This is the first in a multipart blog series looking at some of the most important concepts and methods in how to make your AI more efficient, both at a high level as well as selective deep dives.

Background

CNNs have been the talk of the town in recent years, but the technology is not something new and has endured a slow development in the past. For example, training artificial neural networks with back-propagation was proposed in the 1980s, training networks with convolutional layers in the 1990s, and many more CNN-related papers in the computer vision field during the 2000s. — often it required weeks or even months to finish training a network due to its high model complexity, mostly from the use of convolutions (more on this in the next part of this series); CNN accuracies were no better than statistical machine learning-based methods, which were much easier to train and compute (minutes to hours); finally, there was no practical way to actually deploy trained CNNs in any real-world application as it takes too long for a trained CNN just to produce a single prediction.

LeNet’s convolutional neural network architecture, from LeCun et al., 1998.

These issues hampered the CNN community until the year 2012, to the rest of its competitions, bringing CNNs to the spotlight ever since and never looked back. To summarize some of AlexNet’s breakthrough contributions in solving the issues that plagued prior CNNs’ advancement and development:

  • using GPUs to concurrently optimize a single CNN’s millions of parameters, allowing a network — even with many layers, to be trained in days instead of weeks to months.
  • introducing the ReLU activation allowed the network to overcome the vanishing gradient issue with many (deep) layers, thereby the network can be optimized properly to produce highly accurate results.
  • using GPUs to train a network faster also means GPUs can be used for the network to compute its results much quicker (deployment) than before (sub-second instead of minutes), which allows CNNs to be practically used in real-world setups with GPUs.
AlexNet was the only deep learning/CNN-based approach, significantly more accurate than the rest.

While AlexNet had solved some of CNNs’ most difficult problems, to be more scalable in real-world applications (these restrictions also created many lucrative opportunities):

  • a CNN has millions (most of the time, even more!) of parameters, so a network requires a large number of images for training a CNN without overfitting the data. For example, the benchmark-setting ImageNet dataset back in 2012 already contained 1.2 million images with 1,000 categories, and that wasn’t even enough for a CNN like AlexNet to train on — they had to perform data augmentation (artificially increasing the amount of data by randomly manipulating the given images such as rotation, flipping, cropping, etc.) to artificially increase the training data to 10x more. Today, This has always been a major issue for any deep learning-based approaches because the data required would always take the development team a lot of time to collect the data of various conditions, and then to get that data hand-labeled for network training. This is a highly resource-consuming process for any company or research institution, especially for smaller organizations such as startups and academic labs.
  • while mass training data requirement is an issue for most, if not all, deep learning/CNN based approaches, this also created lucrative business opportunities for companies to provide data labeling and management related services, the most notable example being Scale AI which has raised over $600 million in just five years of time, with estimated market size of $10 billion in the next five years.
Example of a labeled image with semantic segmentation information Image from BDAN.
  • while a high-performance GPU’s powerful multi-threaded processing ability remains the go-to choice to run a CNN for both training and deployment usage, they are typically expensive and require desktop-to-server-like setups, which make them more suitable for enterprise and cloud-based solutions. However, more and more AI use cases are now mobile-related, which will need AI models to run on edge devices such as smartphones, vehicles, cameras, and more, and edge devices typically don’t have the physical space nor the BOM (build of material) cost budget for such high-performance GPUs.
  • because most CNNs require the use of high-performance GPU for its multithreading capabilities, companies such as Nvidia have grown quickly by expanding their GPU’s original gaming applications into AI applications, which resulted in their annual revenue growing from $4 billion in 2012 to now more than $24 billion in 2022, an increase of 6-fold. In fact, I can’t think of any other brands of GPUs being used for training deep learning networks today.
Network inference time (lower the better) CPU and GPU comparison. Image from Stanford University.

Cloud AI vs Edge AI

Given that CNNs typically require the use of high-performance GPUs, that requirement had kept most of the serious AI deployments to be in the cloud, where there could be servers of GPUs to be set up and to provide cloud AI services, which are just not physically possible for any edge devices. However, why do we care about running AI directly on the edge devices such as smartphones, cameras, and vehicles as compared to relying on cloud AI to provide AI services remotely using their onsite servers?

  • if the computation is done at the cloud, the data will first need to be transferred to, and then back from the cloud to the end device for use. On the other hand, AI running at the edge would not have this issue because there is no need to transmit data to and from the cloud.
  • because the data is being transmitted wirelessly to and from the cloud, This issue can be especially detrimental to safety-critical systems such as ADAS and self-driving-related services. Once again, because AI running at the edge does not transmit data to and from the cloud, the chance of being hacked and manipulated is greatly reduced.
  • as a cloud AI service provider, expanding their business would mean physically increasing the number of their processing capabilities — However, if AI is computed directly at the edge devices, then the AI service provider only needs to focus on the software, not hardware.
  • cloud-based AI is typically operating more on a B2B basis, where the cloud AI provider provides services to a customer, which then uses this AI service to build their own business or consumer-facing products, but if AI services are running directly at the edge on devices such as smartphones and security cameras, then the AI service provider now , allowing the business to directly reach billions of end devices.
Image source: devopedia

AI Needs To Be Efficient To Run At The Edge

There are clear benefits of running AI at the edge, but most deep learning AI/CNNs need to run on a high-performance GPU for fast inferences (getting results). These GPUs are bulky and expensive and are not practical for AI deployment at the edge, what do we do? Well let’s see, do edge devices come with GPUs as part of their compute, if so we can just use those onboard GPUs to run the AIs on, right? Yes and no. — a state-of-the-art CNN for image recognition may take about 0.1 seconds to make an inference using an Nvidia Titan Xp GPU, will take more than 1 second to do the same on a mobile-grade GPU, not to mention we usually would need a CNN to run in real-time (typically means 30 images/frames per second (FPS)) for important applications, meaning it would need to be able to make an inference in less than 0.03 second! To make matters worse, onboard GPUs are usually already being heavily utilized by background as well as video-related applications, so there’s not much compute resource left anyway. , it is acting more as a separate silicon so that AI computations don’t have to contend for computing resources against other applications all using the same GPU.

Therefore, the best way for us to move forward here is to somehow (if you don’t want to wait years for those edge processors to finally become powerful enough, if ever). CNNs are slow to make inferences because of their inherent design that requires a lot of computation to be performed, but there are a number of ways to significantly improve their efficiency, allowing the end result to be edge-capable at satisfactory speeds. We will be discussing some of the popular methods such as in the next part of this A Guide To Developing Efficient AI For Edge Computing series.

Phiar

Artificial intelligence to guide you anywhere.