What’s the Fuss About GPU’s and TPU’s ?

Rex Orioko
Google Cloud - Community
3 min readFeb 23, 2024

Let’s go back in time for a moment and understand how the ancestor of GPU’s and TPU’s worked, that is a CPU. CPU’s definitely were the traditional way of processing input into a computer which follows the Von Neumann Architecture pattern, this architecture pattern contains a processing unit, a control unit, memory and mass external storage.

One of the biggest benefits of a CPU is its flexibility and ability to handle different computer related tasks from handling mathematical tasks, to flying drones, up to image classification within a neural network. The CPU has to store the results of its calculations/processes within the memory of the CPU for every calculation. Over time this gradually begins to become the downfall of the CPU when it comes to compute intensive processes. Ever had your laptop freeze back in the day when you booted up Android studio 😀and the fans spin so hard, this is why, the capacity of the memory to store so many operations had been exhausted and so other applications or tasks were starved(frozen/unresponsive).

How About a GPU?

GPU stands for Graphics processing unit, in other to gain accelerated processing speed, they tend to have more ALU’s within it’s processing unit, ALU stands for arithmetic logic unit which is the component holding the controls for multipliers and and adders which accesses memory and executes a multiplier or adder one by one, this usually ends up limiting throughput and consuming significant amounts of energy, this should help you understand why bitcoin mining consumes so much energy and is frowned upon by regulators.

A modern GPU houses anywhere from 2,500–5000 ALU’s in a single processor enabling it to process complex calculations with very high throughput, we are talking night and day in terms of performance results in training deep learning models compared to a CPU.

Unfortunately, GPU’s are meant to be for general purpose computing that has to support different applications and software, meaning it’s not purpose built for training compute intensive ML workloads.

Google’s TPU

TPU stands for tensor processing unit, they are custom built and optimized processors specifically for training and inference of large AI models, they are designed to scale cost-efficiently for a wide range of AI workloads, spanning training, fine-tuning, including inference. Google’s TPU were designed as a matrix processor, these TPU’s aren’t general purpose and so cannot run popular applications like word, or execute bank transactions possible with CPU’s.

Thousands of multipliers and adders were designed to be connected to each other in the case of TPU’s to form a large physical matrix of those single operators which exist in a CPU; this approach is known as the Systolic Array architecture.

During the whole process of massive calculation by these TPU’s, no memory access is required at all enabling extreme throughput and efficient use of energy.

As you think about where to run your machine learning workloads, serve your LLM models on GKE for the more savvy development teams, use a managed mlOps platform like VertexAI or take advantage of Google’ latest and greatest LLM Gemini, know that GCP is leading the conversation from the ground up, by investing in the future of AI infrastructure, knowing your platform of choice is prioritizing resources for the infrastructure that powers all of your workloads provides a sense of confidence going into the future. Google plans to serve, run and manage LLM models securely and sustainably by following our AI principles.

To learn more about TPU’s, visit the official Google Cloud documentation page.

--

--