Use of Graphics Processing Units on the Rise

Published in

Future Tech

5 min readMay 16, 2017

An overview of this year’s GPU Technology Conference (GTC) is about the world of GPU-driven deep learning and real-world applications of AI.

GPUs are the present in accelerated computing for analytics and engineering. Proekspert has been at the cutting edge of smart machines and software for 24 years and is actively investing in data science software and infrastructure. In the spirit of Genchi Genbutsu (“go to the source and see it for yourself”), we visited this year’s GTC. Here is a recap of the zeitgeist at the event.

While the CPU outperforms the GPU in latency and energy efficiency, the GPU is the way forward for high-throughput massively parallel computing (growing 1.5x year over year), matching the pace of data growth and reducing the compute gap of the CPUs. John Hennessy from Stanford University has claimed the start of a new era for computing in 2017. The underlying core concept is CUDA (Compute Unified Device Architecture), a decades old parallel computing platform and programming model suitable for accelerating common tensor operations (matrix multiplication and summation), for example in deep learning. With CUDA 9, synchronizing across multiple GPUs enables any scale of computing, a step towards an operating system for accelerated computing. The GPU Open Analytics Initiative is working towards pushing the entire stack of data science into GPUs, with Anaconda data science distribution, the H2O data science platform and MapD database providing the basis.

One of the fields taking the most out of this trend is narrow AI and its applications. At GTC deep learning and AI was well over half of the content.

Deep Learning

The times when more software will be written by software than humans are no longer so distant. The forefront of this direction is driven by 5 tribes of machine learning, a subfield of AI: symbolists, Bayesians, analogizers, evolutionaries, and most prominently connectionists — called deep learning in the mainstream.

For high-end development of deep learning models, numerous frameworks support the most advanced data center GPUs. If you are an engineer making decisions about your technology stack, there is ample of choice. The Microsoft Cognitive Toolkit (CNTK), which focuses on scalability and performance, Facebook’s highly customizable PyTorch, and production ready Caffe2, Google’s popular TensorFlow, academic Theano and the collaborative endeavour MXNet provide the basis for adding intelligent features related to computer vision, text, speech, images, videos, time-series and more. Symbolic loops over sequences with dynamic scheduling, turning graphs into parallel programs through mini-batching, reduced communication overhead are but a few of the exemplary features available at production quality. For example, building a leading image classification ResNet, that performs better than humans at a 3.5% error rate, is estimated to be a 30min task with the new frameworks. Deep learning has turned into a popular choice with its Lego-like building blocks that can be rearranged into specialized network architectures. There are many use cases for the method.

As a specific example of networks inspired by game theory, generative adversarial networks are starting to find new applications. For example, for simulating data, working with missing data, realistic generation tasks, image-to-image translation (e.g. from day to night), simulation by prediction for particle physics, learning useful embeddings in images, and others. Networks are strong for perceiving and learning, but not for abstracting and reasoning. This is being solved by the new wave of AI for contextual adaptation that combines the statistical learning approach with handcrafted knowledge. The need for samples is decreasing considerably both for networks and for the new wave of models. For example, the new models can be trained with tens of labels in handwritten dataset instead of the previous 60k.

Autonomous Vehicles

Udacity is democratizing development skills and knowledge for building autonomous cars

Not limited to deep learning, the rising professional application of GPUs is narrow AI. A prominent field here is autonomous cars, where custom L3/L4 autonomy for cars can be bought without having to build the physical infra. Nvidia PX2, and modular and scalable Driveworks SDKs make advanced tasks like calibration, sensor fusion, free space detection, lane detection, object detection (cars, trucks, traffic signs, cycles, pedestrians, etc) and localization fast and easy. Developers of autonomous vehicles can focus on their applications instead of the highly complex development of the base components.

AR/VR

Virtual Reality and Augmented Reality are finding its first scalable business cases

Moving closer to the roots of GPUs, namely computer graphics, there was another maturing trend well present at GTC. The devices for AR and VR have matured considerably in the decades since their inception. Novel directions like AI in VR are explored for interactive speech interfaces, visual recognition, data analysis and collaborative sharing. Corporate R&D teams are working on concepts for the metaverse native generations that are in the early stages. A step in this direction is Nvidia’s Holodeck, which is a photorealistic, collaborative virtual reality environment that incorporates the feeling of real-world presence through sight, sound and haptics. The state of the art can handle products as complex as the new electric Koeningsegg car design. By fitting the entire dataset into the GPU, multi-caching technologies enable interactive slice and dice queries and visualizations of fairly large datasets (384GB as of May 2017) in milliseconds.

Looking Forward

Many industries are affected by the rising trend of GPUs; for example, companies focused on healthcare, materials, agriculture, maritime, retail, the elderly, mapping, localization, self-driving, graphics, analytics, games and music are discovering and inventing new ways of interacting with the new era of abundant computing power. While I/O is still the bottleneck, we are entering a new era of craftsmanship focused work at the intersection of art, science and engineering. This is evident from the 11x rise in GPU developers during the past 5 years.

The frontier is about finding better ways to manage the model and experiment complexity explosion. For example, in 2017 Google NMT runs at 105 exaFLOPS with 8.7B parameters, in 2016 20 exaFLOPS and 300M parameters were needed for Baidu Deep Speech 2, in 2015 Microsoft ResNet required 7 exaFLOPS with 60M parameters. One exaFLOPS is equivalent to running all the supercomputers in the world for 1 second in May 2017. Proekspert is evaluating how this trend is impacting data scientists generally and what tools a data scientist needs to achieve and maintain high performance and productivity in the new era.

Use of Graphics Processing Units on the Rise

Deep Learning

Autonomous Vehicles

AR/VR

Looking Forward

Written by André Karpištšenko