The Exponential Growth of AI Compute: Powering the Next Era of Generative AI and Accelerated Computing | NVIDIA Blackwell Architecture

Published in

Nimbus Niche

8 min readMar 20, 2024

Introduction

The rapid advancement of hardware and software technologies has fuelled an exponential growth in computational power, paving the way for groundbreaking developments in artificial intelligence (AI). This surge in compute has been particularly transformative in the realm of generative AI and large language models (LLMs), enabling organisations to tackle increasingly complex problems and unlock new possibilities across various industries.

At the heart of this computational revolution lies the concept of accelerated computing, which leverages specialized hardware, such as graphics processing units (GPUs), to perform massive parallel processing. By harnessing the power of accelerated computing, researchers and organisations can train and deploy AI models with unprecedented speed and efficiency.

The landscape of AI compute has been reshaped by advanced architectures like NVIDIA’s Hopper and the recently unveiled Blackwell. These architectures are purpose-built to handle the immense computational demands of modern AI workloads, especially generative AI and LLMs.

Understanding Compute

It’s essential to understand what “compute” refers to in this context. Compute is a broad term that encompasses the processing capabilities of a computer system, including hardware components, computational tasks, and performance metrics.

Hardware Components:

Central Processing Unit (CPU): The primary component responsible for executing instructions and performing calculations, consisting of multiple cores for independent execution.
Graphics Processing Unit (GPU): Specialized processors designed for rendering graphics and performing parallel computations, excelling at tasks with large data parallelism.
Tensor Processing Unit (TPU): Custom-built ASICs developed by Google for accelerating machine learning workloads, particularly neural network inference and training.
Field-Programmable Gate Array (FPGA): Programmable hardware devices that can be configured for specific computational tasks, offering flexibility and high performance.

Computational Tasks:

Floating-Point Operations (FLOPs): Measure of the number of floating-point calculations a system can perform per second, commonly used to quantify computational performance.
Parallel Computing: Breaking down tasks into smaller subtasks for simultaneous execution on multiple processing units, improving performance and scalability.
Vectorization: Performing multiple computations simultaneously by applying the same operation to elements of vectors or arrays, leveraging SIMD instructions.

Performance Metrics:

Clock Speed: The speed at which a processor executes instructions, measured in gigahertz (GHz).
Throughput: The rate at which a system can process data or execute tasks, often measured in operations per second (OPS) or FLOPs.
Latency: The time delay between initiating a task and receiving the result, crucial for real-time applications.
Power Efficiency: The amount of computational work a system can perform per unit of energy consumed, important for mobile devices and data centers.

The Exponential Growth of AI Compute

To grasp the remarkable growth of AI compute, let’s examine the trend in GPU price-performance over the past 15 years. A comprehensive analysis of 470 GPU models released between 2006 and 2021 reveals that the amount of floating-point operations per second (FLOPS) per dollar doubles approximately every 2.5 years.

Image source: https://epochai.org/blog/trends-in-gpu-price-performance

This 2.5-year doubling time is slightly slower than the 2-year cadence associated with Moore’s Law for CPUs, but considerably faster than previous estimates for GPU price-performance improvements. Notably, GPUs commonly used in machine learning research exhibit an even faster rate, with FLOPS per dollar doubling every 2.07 years.

FLOPS per dollar over time for ML GPUs, with trendline showing the 2.07-year doubling time

The relentless progress in GPU performance has been a key enabler of the AI revolution. With each new generation of GPUs, researchers can train larger and more sophisticated models, pushing the boundaries of what’s possible with AI.

The Rise of Accelerated Computing Architectures

NVIDIA GB200 Superchip Incl. Two Blackwell GPUs and One
Grace CPU| Image source: https://nvdam.widen.net/s/xqt56dflgh/nvidia-blackwell-architecture-technical-brief

NVIDIA’s Blackwell Architecture represents a quantum leap in GPU technology, boasting an impressive 208 billion transistors and harnessing the power of TSMC’s 4NP process. This monumental achievement solidifies Blackwell as the largest GPU ever built, offering an unprecedented compute power of 20 petaFLOPS and setting new standards in computational performance.

Key Innovations

Unified GPU Design: Blackwell’s architectural innovations include merging two dies into a unified GPU, interconnected by a high-speed 10 TB/s chip-to-chip interface. This design maximizes computational efficiency and scalability, surpassing traditional FLOPS rates.
Transformer Engine Advancements: Blackwell introduces the second-generation Transformer Engine, tailored for enhancing inference and training tasks for large language models (LLMs) and Mixture-of-Experts (MoE) models. Leveraging custom Blackwell Tensor Cores and innovative precision formats, this engine achieves high accuracy and throughput, complemented by advanced dynamic range capabilities.

Breakthrough Features

World’s Most Powerful Chip: Packed with 208 billion transistors, Blackwell GPUs deliver an unprecedented 20 petaFLOPS of compute power, unlocking new frontiers in computational capability.
Second-Generation Transformer Engine: Optimized for LLMs and MoE models, this engine supports 4-bit floating point (FP4) precision, doubling performance and model size, thus revolutionizing the landscape of AI computation.
Fifth-Generation NVLink: With a bidirectional throughput of 1.8 TB/s per GPU, Fifth-Generation NVLink facilitates seamless communication among hundreds of GPUs, enabling the training and deployment of massive models at unprecedented speeds.
RAS Engine: Blackwell incorporates a robust RAS (Reliability, Availability, and Serviceability) engine, providing in-depth diagnostic information for efficient maintenance and issue resolution. This engine reduces turnaround time by quickly identifying and addressing potential issues, thereby minimizing downtime and optimizing system reliability.
Secure AI and Decompression Engine: Advanced confidential computing capabilities ensure the security of sensitive data and AI models without compromising performance. Additionally, the Decompression Engine, in conjunction with Spark RAPIDS libraries, delivers unparalleled database performance, powering data analytics applications with unmatched efficiency.

Introducing the NVIDIA GB200 NVL72
A significant advancement in computational architecture, connecting 36 GB200 Superchips comprising 36 Grace CPUs and 72 Blackwell GPUs in a single cluster. This rack-scale design is liquid-cooled, ensuring optimal performance and reliability for the 72-GPU NVLink domain.

One of the most remarkable features of the GB200 NVL72 is its ability to function as a unified, massive GPU, offering a substantial improvement in real-time inference capabilities compared to previous generations. Specifically, it can achieve up to 30 times faster real-time inference for trillion-parameter LLMs, showcasing the tremendous computational power and efficiency of this innovative design.

A more detailed explanation of complete architecture details along with performance benchmarks can be found here: NVIDIA Blackwell Architecture Technical Brief

Empowering the Next Generation of AI Models

The rapid growth of AI compute has been closely intertwined with the increasing size and complexity of AI models. Examining the training computation requirements of notable AI systems over time reveals an astounding trend.

The chart above illustrates the exponential growth in training computation, with the most advanced models like GPT-4 and Gemini Ultra requiring over 1e22 FLOPS — a staggering increase from the sub-1 FLOP models of the 1960s. This trend underscores the critical role of accelerated computing in enabling the development and deployment of such large-scale AI systems.

Interestingly, the data also reveals a shift in the landscape of AI research. While academia led the charge in the early decades, industry and academia-industry collaborations have increasingly dominated the high-compute regime in recent years. This trend highlights the importance of collaboration and the pooling of resources to tackle the most demanding AI challenges.

Supercharging Scientific Discovery and Technological Advancement

The exponential growth of AI compute extends beyond language models and generative AI. Across various domains, from drug discovery and protein folding to climate modelling and astrophysics, the ability to harness vast computational power is revolutionizing scientific discovery and technological advancement.

Supercomputers, equipped with cutting-edge accelerators like NVIDIA’s Blackwell GPUs, are pushing the boundaries of what’s possible in computational science. The chart above depicts the exponential growth in peak performance of the world’s fastest supercomputers, with exascale systems now capable of quintillions of operations per second.

This immense computational power is being leveraged to tackle some of humanity’s greatest challenges. From designing more efficient renewable energy systems to accelerating the development of life-saving drugs, the convergence of AI and high-performance computing is ushering in a new era of scientific breakthroughs.

Conclusion

The exponential growth of AI compute, fuelled by relentless advancements in accelerated computing architectures like NVIDIA’s Blackwell, is powering the next era of generative AI and transforming industries worldwide. As computational power continues to grow at an astonishing pace, we can expect AI models to become even more sophisticated, unlocking new possibilities and reshaping our world in profound ways.

However, this rapid progress also brings forth important considerations around energy efficiency, sustainability, and responsible development of AI systems. As we push forward into this exciting future, it is crucial that we navigate these challenges thoughtfully, ensuring that the benefits of AI are harnessed for the greater good of humanity.

The age of generative AI and accelerated computing is upon us, and with architectures like Blackwell leading the charge, we stand at the precipice of a new era of technological advancement. The only question that remains is: how far will our imagination and ingenuity take us?

Referred links:

NVIDIA Blackwell Platform Arrives to Power a New Era of Computing

Powering a new era of computing, NVIDIA today announced that the NVIDIA Blackwell platform has arrived - enabling…

nvidianews.nvidia.com

NVIDIA Blackwell Architecture

A generative AI architecture that defines the next chapter in accelerated computing with unparalleled performance…

www.nvidia.com

nvidia-blackwell-architecture-technical-brief.pdf

Edit description

nvdam.widen.net

Computation used to train notable artificial intelligence systems

Computation is measured in total petaFLOP, which is 10¹⁵ floating-point operations estimated from AI literature, albeit…

ourworldindata.org

Computation used to train notable artificial intelligence systems

Computation is measured in total petaFLOP, which is 10¹⁵ floating-point operations estimated from AI literature, albeit…

ourworldindata.org

Computational capacity of the fastest supercomputers

The number of floating-point operations carried out per second by the fastest supercomputer in any given year. This is…

ourworldindata.org

Computation used to train notable AI systems, by affiliation of researchers

Computation is measured in total petaFLOP, which is 10¹⁵ floating-point operations estimated from AI literature, albeit…

ourworldindata.org

The Exponential Growth of AI Compute: Powering the Next Era of Generative AI and Accelerated Computing | NVIDIA Blackwell Architecture

Introduction

Understanding Compute

The Exponential Growth of AI Compute

The Rise of Accelerated Computing Architectures

Empowering the Next Generation of AI Models

Supercharging Scientific Discovery and Technological Advancement

Conclusion

NVIDIA Blackwell Platform Arrives to Power a New Era of Computing

Powering a new era of computing, NVIDIA today announced that the NVIDIA Blackwell platform has arrived - enabling…

NVIDIA Blackwell Architecture

A generative AI architecture that defines the next chapter in accelerated computing with unparalleled performance…

nvidia-blackwell-architecture-technical-brief.pdf

Edit description

Computation used to train notable artificial intelligence systems

Computation is measured in total petaFLOP, which is 10¹⁵ floating-point operations estimated from AI literature, albeit…

Computation used to train notable artificial intelligence systems

Computation is measured in total petaFLOP, which is 10¹⁵ floating-point operations estimated from AI literature, albeit…

Computational capacity of the fastest supercomputers

The number of floating-point operations carried out per second by the fastest supercomputer in any given year. This is…

Computation used to train notable AI systems, by affiliation of researchers

Computation is measured in total petaFLOP, which is 10¹⁵ floating-point operations estimated from AI literature, albeit…

Written by Nipunika Jain