Sitemap
Sia AI

Optimists for change

Unraveling NVIDIA‘s rise and dominance over the GPU, cloud AI and GenAI industry

7 min readMar 10, 2025

--

Press enter or click to view image in full size

Before focusing on NVIDIA, some basic concepts need to be clarified in order to fully understand the topics addressed below. If you already know about GPUs and how it impacted AI and GenAI, you can directly skip to the NVIDIA’s Rise to Leadership section.

What is a GPU?

First, GPU stands for Graphics Processing Unit, is a type of computer chip designed to perform many calculations in parallel very quickly and efficiently. It was originally created to render images and videos for displays but has evolved to become a powerhouse for a wide range of computing tasks, especially in fields like artificial intelligence (AI), machine and deep learning, and scientific computing.

Let’s go back to the basics, and why CPUs and GPUs are different.

CPU vs. GPU

Press enter or click to view image in full size
Fig1 : GPU vs CPU architecture
Source: A New Morphological Anomaly Detection Algorithm for Hyperspectral Images and its GPU Implementation, Abel Paz and Antonio Plaza

DRAM stands for “dynamic random access memory,” and it’s a specific type of RAM (random access memory). All computers have RAM, and DRAM is one kind of RAM we see in modern desktops and laptops. In GPUs, DRAM is integrated into the architecture, which significantly accelerates operations (see ‘Why GPUs are Faster for certain tasks’ section).

ALU is for Arithmetic Logic Unit, it is the part of a central processing unit that carries out arithmetic and logic operations on the operands in computer instruction words.

A CPU (Central Processing Unit) is like the brain of a computer. It’s optimized to handle a wide variety of tasks sequentially (one at a time, or a few tasks in parallel).

  • Strengths: Versatility, handling complex tasks like running an operating system, and decision-making.
  • Weaknesses: Limited in parallel processing (it can only handle a few tasks simultaneously).

A GPU is designed differently. It has thousands of smaller cores that can work on many tasks in parallel. This makes it perfect for tasks that involve lots of repetitive calculations, like rendering millions of pixels for an image or processing data-heavy computations.

  • Strengths: Massive parallelism and speed in certain tasks.
  • Weaknesses: Less versatile than a CPU for general tasks.

In simple terms, CPUs can be seen as several cores working on tasks independently, GPUs as multiple cores working on tasks together.

GPUs early applications: Video Rendering and Gaming

In the early days of computing, GPUs were developed to handle graphics (well named) and rendering for video games and visual applications. Tasks like rendering 3D models, textures, and smooth animations required immense amounts of mathematical calculations: processing pixels of 3D objects is all about cross-matrixes calculations, which requires crazy load of computing. Video games publishers pushed for higher frame rates and more realistic graphics, requiring GPUs to evolve rapidly. Better chips meant better ability to handle mathematical operations that were repetitive, such as:

  • Vector calculations: calculating angles and directions for movement.
  • Matrix transformations: moving objects in 3D and project their representations in the image plane for virtual visual rendering.
Press enter or click to view image in full size
Fig 2 : 3D-object model and concepts for visual rendering

GPUs in AI and High Technology applications

As technology advanced, more and more scientific applications necessitated high performance calculations and parallel computing:

  1. AI: Training models, and especially neural networks requires performing millions of matrix calculations repeatedly: teaching an AI model to recognize a cat involves processing vast amounts of data through neural networks, which rely heavily on matrix multiplication.
  2. High-Performance Computing (HPC): In scientific simulations (like weather modeling or molecular simulations), GPUs excel at processing huge datasets and complex equations quickly.

Why GPUs are faster than CPUs for certain tasks

  1. Parallel Architecture: GPUs can handle thousands of operations simultaneously, whereas CPUs focus on sequential processing.
  2. Specialized Cores: GPU cores are optimized for tasks like matrix multiplication, essentials in ML/AI, or vector addition, used in physics simulations
  3. Memory Bandwidth: GPUs can handle large data transfers more efficiently.
    As an example, effective data transfer rate from RAM to CPU cores varies between 20 to 100 GB/s or more with high-end modules. NVIDIA’s H100 GPU memory bandwith stands at 3.35TB/s. On memory usage, GPUs are around 100 times faster than CPUs. This makes a huge difference for high-performance tasks.

In 2012, AlexNet won the ImageNet Large Scale Visual Recognition Challenge with a top-5 error rate of 15.3%, significantly outperforming the runner-up by 10.8 percentage points. This success was largely due to the depth of the model and the use of GPUs for training, which made the computationally expensive process feasible. AlexNet was trained on 1.2 million images for 90 epochs, taking five to six days on two NVIDIA GTX 580 GPUs. This achievement highlighted the effectiveness of GPUs in AI and led to widespread adoption of CNNs and GPUs in computer vision research. As of mid-2024, the AlexNet paper has been cited over 157,000 times.

Now that we’ve reviewed the basic GPU concepts, let’s focus on NVIDIA !

NVIDIA’s Rise to Leadership

In the early 2000s, NVIDIA is a known chips manufacturer along with AMD and INTEL. But before its competitors, NVIDIA recognized the potential for GPUs to go beyond gaming. By the 2020s, they had cemented their position as a leader in AI and HPC with mainly 2 key innovations:

  1. CUDA (Compute Unified Device Architecture): NVIDIA introduced CUDA in 2006! A pioneer programming interface that allowed developers to use GPUs for general-purpose computing, not just graphics. It gave scientists and developers a user-friendly way to program GPUs for complex tasks which highly simplified model configurations. CUDA played a pivotal role in NVIDIA’s success, as it had been actively developed and supported for over a decade before the AI boom truly began. NVIDIA developed other major software tools, like TensorRT and cuDNN, which made it easier for developers to harness GPU power.
  2. Betting early on AI & scientific calculations: NVIDIA anticipated the explosion of AI and supercomputing demands by focusing on developing hardware tailored for highly intensive calculations and investing in specialized GPUs (like the NVIDIA A100 and H100) optimized for AI and scientific computing.

NVIDIA’s GPUs were unmatched in speed and efficiency for AI tasks. The H100 Transformer Engine, released in 2022, specially optimized for training and running large AI models became the gold standard. This striking performance is driven by several technical feature choices, with two particularly noteworthy:

  1. Gain speed in processing operations by taking advantage of sparsity in network weights in A100 and then H100 Tensor Processing Units (TPUs).
  2. Reduce useless computation by switching the H100 TPUs floating-point precision format from 32-bit to 16-bit floating-point (FP16) and 8-bit (FP8). By default, NVIDIA’s TensorFlow and PyTorch containers use the 32-bit format.

By reducing some precision, NVIDIA’s Transformer Engines made it possible to train larger networks faster without compromising accuracy, resulting in a competitive advantage for AI and GenAI applications.

Fig 3 : AI models exploding computational requirements.
Source : NVIDIA

The resources required to train a model are measured in FLOP (Floating Point Operations). FLOPS (Floating Point Operations per Seconds) is a measure of a computer’s performance based on the number of floating-point arithmetic calculations a processor can perform within a second.

In 2022, Transformers were reshaping the AI industry and their computational needs grew more rapidly than those of other forms of AI. H100 blew up the competition with its ability to dynamically choose what precision is needed for each layer in the neural network at each step in training a neural network. Fig 4 shows for H100, how much the sparsity (with *) and floating-points (FP) precision influences operations’ speed.

By optimizing operations, minimizing unnecessary computations, and reallocating resources more effectively, NVIDIA chips were able to train consequently larger models than its competitors without requiring additional processing units.

Fig 4 : H100 SXM processors performance specifications
Source : NVIDIA

NVIDIA is not just about chips

NVIDIA is, at its core, a chip manufacturer — ultimately, its revenue comes from selling chips. But its dominance over the AI and high performance computing lies in the creation of a comprehensive software ecosystem that gives scientists and developers the ability to leverage its complex hardware efficiently. They understood that investing in sofware tools like CUDA, and AI-focused software libraries like cuDNN and TensorRT is key to develop a developer-friendly environment, offering robust support, and scalable solutions for both research and industry actors.

To conclude…

NVIDIA developed, sooner than its competitors, software tools support, libraries, and platforms which made it easier for developers to harness GPU power. Their early recognition of AI’s potential in the use of GPUs and early investment in these tools for these specific applications have placed it at the forefront of this revolution. One important thing to remember is that in the business world, computer chips are often treated as if they were a generic commodity… However, there are tens of thousands of different components, each designed as a specialized tool for specific tasks. NVIDIA’s corporate history has positioned them to create the best chips, with the right architecture for advanced scientific computing, which happened to explode with GenAI in the early 2020s, placing NVIDIA in a league of its own.

Now NVIDIA’s solutions for AI go beyond hardware. Their software solutions ecosystem and cloud services available worldwide make them a key player for advanced AI and GenAI applications. Whether you’re a tech giant, an AI-focused company, or a scientist seeking higher performance, NVIDIA’s GPUs and software tools deliver unparalleled capabilities.

Written by Alexandre Orhan

--

--

Sia AI
Sia AI

Written by Sia AI

Solutions to help the enterprise navigate change

No responses yet