FPGAs, Deep Learning, Software Defined Networks and the Cloud: A Love Story Part 2

Digging into FPGAs and how they are being utilized in the cloud.

FPGAs have been around since the 1980s but are having a resurgence as of recent. In the past year, most of the major cloud providers like Amazon, Microsoft, Aliyun, Baidu and Huawei have announced FGPA offerings. Why all of the sudden interest in FPGAs? This is a continuation from Part 1 where we explored some of the key benefits driving adoption. In Part 2 we will explore how FPGAs are being utilized to improved deep learning workloads.


Re-Introduction to FPGAs

To lazy…I mean busy to check out part 1 of this article? No worries, we’ll do a quick recap here. Field Programmable Gate Arrays (FPGA) which were first invented in the 1980s have been making a bit of a comeback as of late. As compute demands have become increasingly complex, general purpose compute processors like CPUs alone are no longer the optimal choice for the job. Enter FPGAs. Some key value propositions for FPGAs are:

  1. Speed. Purpose built hardware provides better performance than hardware built for general purpose.
  2. Efficiency & Scale. Increased efficiency means serving more customers with less, allowing the hardware to service a larger scale of workloads.
  3. Cost. Improvements in speed, efficiency while reducing power consumption reduces cost.

For these and other reasons FPGAs have gained popularity. In part 2 we’ll take a look into artificial intelligence, and see what role FPGAs are playing in this 4th Industrial Revolution.


Artificial Intelligence Workloads with FPGAs

Another area where the use of FPGAs are currently being explored is the field of Artificial Intelligence (AI). Some key drivers for FPGA adoption in AI are:

  • Flexibility. FPGAs are ideal for adapting to rapidly evolving machine learning workloads as you can reprogram the chip for increased optimization depending on the workload you need to run on it.
  • Latency. FPGAs are well suited for latency-sensitive real-time inference requirements that are required in tasks like autonomous driving, speech recognition, anomaly detection and more.
  • Precision. FPGAs allow for increase precision for particular layers in your Deep Neural Networks (DNNs). As an example, NVIDIAs Pascal and Volta GPUs allow you to use both 8 and 16 bit integer values. For a DNN responsible for assessing a person’s sex you just need two values of male or female (3rd coming soon) making the 16 and 8 bit integer values overkill. With an FPGA, a DNN designer can model each layer in the net with 2 bits instead of 16 or 8 bits which has a significant impact on efficiency and performance of Tera-Operations per second as the chart in Figure 4 shows.
Figure 4: Narrow Precision Inference on FPGAs

Now that we’ve covered the high level benefits, lets get into a specific example where FPGAs are being used to significantly increase performance of deep learning.

Real-Time AI

One of the best example of applying an FPGA to deep learning is Real-Time Artificial Intelligence. As there are countless definitions of AI, I’m sure there are many definitions of real-time AI but for this article we’ll define it simply as an artificial intelligence system that can process and transmit data as fast as it comes in, with ultra-low latency. This applies to processing of live data streams such as search queries, videos or sensor streams. How is this achieved with FPGAs? Well typically to boost performance, DNN processors often use high degrees of batching. While very effective for training and throughput based architectures, it is less effective for real-time AI. With large batches, the first query in the batch has to wait for all queries to complete. Microsoft’s FPGA based Project Brainwave is able to handle complex, memory intensive models such as long short-term memory (LSTM), without using batching to increase throughput. The lack of batching means it’s possible for the hardware to handle requests as they come in, providing real-time insights for machine learning systems. To demonstrate this, they ran a large Gated Recurrent Unit (GRU) model 5x larger than Resnet-50 with no batching and achieved a record-setting performance. Even after averaging a couple of hundred microseconds (yes micro, not mili) processing time per request, Microsoft still expects to significantly increase performance on the architecture.

Why Not Just an ASIC for AI?

You may be asking, well what about Deep Leaning ASICs like Google’s TPU? Surely they will perform better and don’t require you to deal with logic and programming of the FPGA? Yes, generically ASICs are pre-programmed allowing you to focus directly on your deep leaning workload and not the processor logic. But with that fixed-function chip if you find you require any optimizations at the chip level, as the TPU isn’t optimized for your specific workload, then you’re out of luck. The FPGA gives Microsoft the ability to continue optimization of the FPGA with a quick turn around. They acknowledged that ASICs like Google’s TPU can provide an extremely fast machine learning accelerator at lower per-unit cost, but the development process can be long, costly and result in a fixed-function chip that can’t adapt as deep learning algorithms evolve. Microsoft states,

“We can incorporate research innovations into the hardware platform quickly, (typically a few weeks), which is essential in this fast-moving space

where Google announced it’s first generation TPU in May of 2016 and the second generation with updated logic a year later in May 2017. Once again, depending on the workload, cloud you are running in and your ability to program an FPGA, the best processor for your needs is relative. It is still early in this space but it seems Microsoft isn’t alone in their approach as most cloud providers have opted to go the route of FPGAs instead of utilizing ASICs. If you want to dive deeper on the subject of FPGAs and TPUs in data centers I recommend reading these published papers titled, Can FPGAs Beat GPUs in Accelerating Next-Generation Deep Neural Networks? and In-Datacenter Performance Analysis of a Tensor Processing Unit​.


The Wrap Up

While I’ve detailed FPGA use with Artificial Intelligence and SDNs in Part 1, the benefits extend way past just those two use cases. Some amazing advancements are being made across many industries through use of FPGAs to accelerate workloads. Some examples include, but aren’t limited to:

  • Genomics. Using FPGA to reduced cost, power demands and storage requirements of genome processing, decrease genome analysis to minutes.
  • Aerospace & Defense . Using radiation-tolerant FPGAs along with intellectual property for image processing, waveform generation, and partial reconfiguration for Software Defined Radios.
  • Automotive. FPGAs are driving (pun intended) innovation of next-gen safety and autonomous driving systems and in-vehicle infotainment.
  • Consumer Electronics. Converged handsets (phones that can be PCs), digital flat panel displays, home networking, and residential set top boxes all powered by FPGAs.
  • Finance. FPGAs enabling dramatic improvements of risk modeling and analysis, transaction analysis for security and high frequency trading.
  • Video & Image Processing. FPGAs lowering non-recurring engineering costs, gamma correction, 2D/3D filtering, chroma re-sampling.
  • Online Search. 1,600 FPGA cluster running in production, dedicated to accelerating feature extraction of documents for the search engine.
Ross Freeman (1948–1989): Electrical Engineer and Inventor

I hope you walk away from this excited about FPGAs (as much as one can be) and their potential to accelerate all kinds of applications and workloads. From customers to cloud providers themselves utilizing FPGAs for record setting innovation, its clear that in 1985 Ross Freeman was not only a trend setter for stylish mustaches and hairstyles, but also onto something great.


If you enjoyed this article, please tap the claps 👏 button.

Interested in learning more about Jamal Robinson or want to work together? Reach out to him on Twitter or through LinkedIn.