What you should know about GPUs Part 2 — The Rise of AI

Alex Reinhart
8 min readDec 26, 2023

--

Part 1 — History & Industry

Part 2 — The Rise of AI

Part 3 — The Paradigm Shift of Compute

The Rise of AI and GPUs

In the past year, AI has been in the spotlight like never before. It’s easy for us to forget that this wow moment came with decades of hard fought development. These models have been around since the early 2000s.

Source

Understanding that the growth in AI has been hard fought, can also help us understand that the hardware for AI has been evolving for some time as well. GPUs continue to be the hardware backbone of AI systems, but this may be changing with the advent of specialized AI chips.

Specialized AI Chips

AMD vs NVIDIA

In the last 5 years the ‘AI chip’ design frontier has grown crowded with Meta, Alphabet, and Microsoft joining the fabless chip designer club. Needless to say this has been very good for TSMC.

Are ‘AI Chips’ fundamentally different from GPUs?

In short, the answer is no — at least not as of today.

AI chips are really a sub category of GPUs in which the ratios of components (processing cores & memory) are tuned to the application. ML tasks require more memory and specific types of cores — tensor cores that are optimized for tensor based multiplication (matrix multiplication). Certain hardware components are also modified for the application e.g. ML tasks require less precision so memory and cores are tuned to use 16-bit or 8-bit data rather than the typical 32-bit.

The NVIDIA H100 is a GPU, it’s just a GPU optimized for AI. It’s a GPU so optimized for AI that it performs poorly for gaming compared to off the shelf GPUs.

Trends

Training and Inference

Training an ML model looks like a very different process than inference.

Traning and Infrence comparison

In training you both calculate forward and propagate information backward to update edge weights. This requires significantly more calculations and more memory. Inference is a forward only process for which there are techniques that help minimize the calculations and memory needed, like sparceification (removing unused weights).

Large Language Models (LLMs) require huge amounts of compute for training. An AI chip (specialized GPU) is the best solution. AI chips today are tuned for training not for inference.

Inference is largely run on CPUs rather than GPUs. GPU optimization for inference may make sense for inference at the edge (on device) and/or at very high speeds (for example self driving). The current state of the art for inference is hardware optimized for the type of calculation needed, an ASIC (Application Specific Integrated Circuit). To employ ASICs you need to design a chip to your use case. For example the TPU (Tensor Processing Unit) created by Google is a ASIC for ML.

Digital vs Analogue

There has been a long debate on weather compute should proceed in the analogue or digital domain.

Analog signals are continuous signals, most often used to represent physical measurements. They use a continuous range of values to represent information. To understand this phenomenon think of a temperature sensor, it provides a constant stream of measurement relaying 32 degrees. Digital instead represents data in binary, so 32 would read 100000. Un-intuitively, Analogue was the first compute technology, but it was found to be unreliable for computation due to its sensitivity to noise, for example if you are reading a 50.29, noise can easily distort your results versus there’s virtually no risk in misreading a 1/0.

Many advocate for analogue computing to return, it is used frequently in sensors that require continuous streams of data, but hasn’t returned to compute. There are a few startups who think that is the answer to memory bandwidth issues in AI computing because there are less conversions and intermediate numbers be saved to memory.

GPU Startup Landscape

There are a few startups that have raised significant amounts of money to challenge NVIDIA in creating the best AI chip. SambaNova Systems has raised 1.1B dollars since their founding in 2017, with the goal of providing a full stack AI service for LLMs for enterprise. Their novel chip SN40L has memory access innovations that aim to alleviate the memory bottleneck. Their technical advantage is nothing to write home about, but they are providing an easy to use compute service in a time of scarcity.

SambaNova Team

Cerebras took a different approach in their AI hardware. They are making chips as big as current manufacturing processes allow- wafer sized. In comparison to the NVIDIA 5120 core chip, Cerebras’s has 850,000 cores, to support that many cores Cerebras has also made advancement in I/O and memory access. They have raised 715M since their founding in 2016, and today have a live training and inference cloud service as well as hardware available for purchase.

Cerebras Chip Size comp

Graphcore a UK based AI chip manufacturer was founded in 2016 and has racked up 682M in funding. Graphcore’s technical edge is a new hardware they call the IPU (inference processing unit). They partnered with TSMC to bring a new 3D manufacturing technology into their IPUs. They tout a 5x faster training speed and 40% faster inference and offer both cloud services and hardware for purchase.

Graphcore IPU

The CUDA Moat

There are a few commonalities amongst all of these well funded NVIDIA competitors, they all continue to leverage standard silicon wafers, and they all have a massive accessibility challenge to overcome with NVIDIA’s proprietary CUDA interface.

‘Graphcore, Cerebras, Tenstorrent, all of these ML accelerator companies, they built chips, the chips were good, the software was terrible’ — George Hotz founder of Tiny Corp.

There was really only one viable strategy, create your own CUDA (your own hardware <> ML library interface). To mimic the competitive moat obtained by NVIDIA, each of these interfaces continues to be developed in house and remain proprietary.

If we zoom out from the AI hardware race there is one startup taking on the interface challenge. The company Modular founded in 2022 has raised 130M to create a universal interface between AI Hardware and software that they call the Max Engine. The Max Engine aims to bridge the powerful moat that NVIDIA has created with CUDA. Modular is also creating a language called mojo, which is a superset of python, making it extremely easy to adopt on existing code. Mojo replaces the need to write in C++ or CUDA and makes it easier to write the code to interface with specialized hardware.

Events, Acquisitions, etc.

Not all Rainbows and Butterflies

Not many AI hardware startups have gone to zero, but 2022 was a challenging time for many of them as NVIDIA came to dominate AI hardware and most venture funding had dried up.

Rain AI a company trying to create chips that mimic neurons in the brain (I would push back on this statement, neuron activity is photonic, but alas) had sourced funding from notable AI figures including Sam Altman, but faced regulatory action after accepting money from a Saudi investor. Goes to show that the AI race is becoming a subject of national security. Or more aptly, an issue of US superiority. Rain AI was saved by a deal with OpenAI to buy $51M in AI chips.

Rain, pls use some of that 51M to re-do your website, it looks like a scam

In a similar vain an AI inference chip company Mythic, ran out of funds after raising $70M. After some floundering, they were able to raise another $13M with a new CEO, Dave Flick stepping up. Their aim is to create an analogue chip, and target energy efficiency in edge computing. I covered the edge inference space less thoroughly in this article but it’s crowded — SiMa.ai, Axelera, Flex Logix, NeuReality, EnCharge, Hailo and Kneron. The fact that they are targeting analog means they’re bringing something new to the table, excited to see how this plays out.

Even though there haven’t been many disastrous failures so far in the AI hardware space, it has become a particularly challenging space to raise money. You may have noticed that between SambaNova, Cerebras, and Graphcore, $2.5B of venture money is locked up, and at least another billion spread over other AI hardware startups that didn’t make that list. There is only a small collection of VCs with the aptitude and appetite for investing in high risk, long horizon hardware products, and most have made their bets for now.

Top Acquisitions:

Habana Labs was an Israeli company making record breaking AI hardware in 2019. Habana had released the Goya chip for inference and the Gaudi chip for training. They were swiftly acquired by Intel for 2B. There were particularly strong synergies between Habana Labs and Intel because they were both creating separate technologies for inference and training, which has proven to be a leading philosophy.

In March 2019 NVIDIA was betting on data center expansion and anticipating models to continue increasing in size, which would begin to require sharding — the breaking apart of models to run on different systems. To maintain speed in cross systems processes they needed high speed networking. This inspired NVIDIA’s aquisition of Mellanox a high performance networking company in March of 2019 for $6.9 Billion. NVIDIA couldn’t have made a better bet, if you look back at this revenue chart, you can see that data centers have been on a steep upward trajectory ever since.

Nvidia Quarterly Revenue by Category

--

--