What Does it Actually Takes to Run ChatGPT?

Krish Bhoopati
Insights of Nature
Published in
10 min readApr 1, 2024
It’s Great

Normally, I start by introducing and explaining what my article is going to be about. But I think we all know what ChatGPT is.

Fun Fact: ChatGPT receives more than 12 million queries a day, meaning it executes tasks more than 12 million times. Imagine your job was to do that every day 😥, thankfully we have AI to help us out😏. But the AI’s salary is a lot more than you may think, after seeing it maybe you would want to take that job after all. Plus, some might say the salary is worth it because of how much they use it. I hate to admit it, but I may or not be a part of that group.

I’m just working smarter, not harder right?

Let’s start with the basics

ChatGPT is the publically accessible chatbot variant of GPT 3.5, a Large Language Model (LLM) from Open AI. An AI research and deployment company with the mission to ensure that artificial general intelligence benefits all of humanity, run by super smart tech people and the one and only Sam Altman.

GPT is an acronym for Generative Pre-trained Transformer, a text generator that’s considered pre-trained because it is trained before it is let loose with revolutionary technology inside. The function of any large language model is to train on a substantial amount of text, and then generate an output from the input that sounds like the training text. According to a paper behind GPT3, it showed that Chat GPT comes from a model that was trained on over 500 gigabytes of text data. This is equivalent to a couple of billion human-written web pages with trillions of words of text.

Now training Chat GPT with all that takes a LOT of time🕒 and money💵. They had to run trillions of words for the equivalent of 300 years through supercomputers processing in parallel for months. In the end, the computer made up to 170 billion connections between all these words. Now all these connections have to be calculated whenever anyone asks Chat GPT anything. Making this a billion-dollar training effort for an LLM, and why running this for the 100+ million active users a month costs more than $100K a day.

CRAZY RIGHT!

The AI Supercomputer Infrastructure

There has been a huge rise in AI capability over the last 10 years📆 which was driven largely by the rise of GPUs and cloud-scale infrastructure. The thing we had to support with LLMs is their self-supervised learning where they learn about a language by examining billions of pages of information over and over again. But the problem at such a large scale is that they’re very resource-intensive and expensive to run. It’s super critical to have an efficient infrastructure running at that size. You have to deal with tons of failures that happen on a regularly, like a server failing or a network link starting to flap. One of the major goals behind the infrastructure is to be able to diagnose and fix it very quickly🛠️.

One of the many companies working on that is Microsoft Azure. They’ve invested a lot in their data center infrastructure with state-of-the-art hardware that is run in Azure globally for raw compute horsepower. A key thing is cluster GPUs (graphic processing units) that have a high bandwidth network to provide connectivity that’s required to efficiently train them. On top of that, Azure software platform optimizations are layered to enlighten the platform and hypervisor to be able to access those network GPUs so that the performance is like running on bare metal. As you move higher, they have integrated open-source frameworks like ONYX for model portability and the DeepSeed framework that helps componentize these models so that they can train across interconnected GPUs at greater speed and scale.

So to train the models, they use something called data parallelism. This is when you train many instances of the model at the same time on small batches of data. After each batch is processed, the GPUs exchange information and then proceed to the next batch. This is why you need systems that are that large.

To optimize the hardware, InfiniBand is being used in the HPC clusters because it provides better cost performance than ethernet, making Azure unique compared to the others. The multiple GPUs required to train LLMs that don’t operate on the same board or even the same rack. Due to power consumption and heat, they’ve been spread out in the data center to network and cluster as many GPUs as they can. NVIDIA’s H100 GPUs let you choose more than 1 GPU and cluster up to 8 of them per VM and scale it to thousands as the demand grows.

There’s almost no limit to where this goes.

Some say Mr.Bean took the first step toward self-driving cars before Elon

A company currently using the Azure supercomputer is Wayve. A UK leader in autonomous driving technologies. If you’ve read my previous articles👀 which I highly recommend you do, you know autonomous vehicles are right up my wheelhouse🛞🏠. Back to Wayve, they’re specializing in distributed AI-powered autonomous systems for self-driving vehicles primarily using vision-based machine learning. Wayve uses Azure’s supercomputer to gather, manage, and train those models on millions of hours of driving data per year. That’s a HUGE amount of data, petabytes of it, with images, GPS data, and sensor data. There are so many industries that use these data centers, making it even more important to take care of.

Ok, enough with Microsoft for now.

Microsoft don’t come for me🙏

The Hardware

Ever wonder how many GPUs it takes for ChatGPT to run? Now we don’t know the exact architecture of ChatGPT other than what Open AI has told us.

It is fine-tuned from a variant of GPT-3.5 — Open AI

But it has around 175 billion parameters, which is MASSIVE. Parameters refer to the number of variables that the model learns during its training. These parameters are adjusted to minimize the error between the model’s predictions and its training data. The number of parameters indicates a high level of complexity in the model, the more you have, the better it is to capture finer nuances and complexities. To train and run a model with 175 billion parameters needs immense computational resources and powerful CPUs, GPUs and extensive memory. A 3-billion parameter model can generate a token in about 6ms on an A100 GPU. Using ChatGPT’s 175 billion parameter model, it should take 350ms for an A100 GPU. On the Azure cloud, each A100 card costs about $3 an hour, which is $0.003 per word. This makes the cost of ChatGPT per month $3 million. The actual costs are probably a lot higher because parallelization is not 100% efficient and the GPUs are also not 100% utilized.

NVIDIA’s V100 GPUs (Volta) vs NVIDIA A100 (Ampere)

The V in V100 stands for Nvidia’s Volta Architecture, and is a really interesting design🤔. This is because it was a major architectural change over all previous Nvidia GPUs and it was a really old hardware. The GPUs are based on Nvidia’s GV100, which is an 815mm squared silicon chip with 21.1 billion transistors and produced by TSMC in a 12nm process. The other version was the GP100 which was faster but at the same time more expensive. But the GV100 had tensor cores, which the GP100 didn’t have. Tensor cores aren’t too different compared to the traditional cores on a GPU, but they excel at matrix processing. In simpler words, they can run🏃‍♂️ a lot of parallel computations but are limited to basic multiply-accumulate calculations. Volta was the first GPU architecture that was designed to accelerate AI workloads. This made AI inference 6x faster and 12x faster AI training. Volta was what made training such a large model like GPT-3 possible. Without Volta, this wouldn’t have been possible and at the time, OpenAI had to go with Volta.

GPT-3 was a much larger general model with many features like translating and summarizing text, similar to ChatGPT. But it wouldn’t be a good chatbot because it wasn’t trained to give human-like answers and it was a large model that required a lot of compute performance. This is why ChatGPT was born, an ML model focused on natural text-based chat conversations that would need lower compute performance. The A100 GPU clusters were released on June 1st, 2021, to Azure customers like OpenAI. ChatGPT was trained on Azure’s infrastructure and supercomputers after the A100s were released, which is what ChatGPT was trained on. The GPU is an 826mm squared die that has 54.2 billion transistors and is produced by TSMC in a 7nm process. With this boost, the tensor performance went up 300 teraflops, which was 2.5x the speed of the V100 GPU while having fewer tensor cores.

Megatron-Turing NLG

In October 2021, Nvidia and Microsoft announced a supercomputer they used to train a new large NN (neural network) called Megatron-Turing NLG, with 530 billion parameters.

Which is a lot more than the Nvidia DGX A100 server with 175 billion parameters, which GPT-3 was trained on. Each contained 8 A100 GPUs, meaning 4,480 A100 GPUs. This supercomputer was built specifically to train a 540 billion parameter NN like the Megatron-Turing NLG, which was capable of handling the smaller ChatGPT model. It’s possible that the hardware similar the one trained the Megatron-Turing NLG also trained ChatGPT.

The ChatGPT Inference Hardware is currently running on Microsoft’s Azure servers and either Nvidia’s DGX or HGX A100 should be enough to run that inference for ChatGPT. Semianalysis came up with a hypothesis that at the current scale, it would cost $694,936 per day to operate ChatGPT in computer hardware costs and OpenAI would require 3,617 A100 servers with 28,936 A100 GPUs just to provide inference for ChatGPT.

U sure it’s $694,936?

Now as I leave you with my last thoughts, although we don’t know the exact amount of money it costs to run ChatGPT and the numbers are calculations based on facts we know. It sure does cost them a lot of money. And I think we should be thankful to OpenAI for letting us use ChatGPT for free (ignoring the paid version). This industry is just starting up and it’s about to BOOM💥. As I go deeper into AI, I’m gonna keep an eye out on companies like Nvidia, Microsoft, Intel, TSMC, and all the other companies involved in the process.

As a Thank You😄 for reading till the end of this article, I wanted to share a project that I’ve been working on for the past month. A consulting project with Microsoft where we tackled the concern that comes with AI growing at such a fast rate and how we can continue this growth while saving our planet and optimizing costs. Check it out here. 👀

TL;DR:

Training Process and CostsChatGPT is trained on over 500 gigabytes of text data, equivalent to billions of web pages, utilizing trillions of words and making 170 billion connections. Training such a model requires substantial time and resources, with estimated costs exceeding $100K per day.

The AI Supercomputer Infrastructure: The rise in AI capability over the last decade has been fueled by advancements in GPUs and cloud-scale infrastructure. Microsoft Azure’s data center infrastructure, equipped with state-of-the-art hardware and software optimizations, supports efficient training and inference for large language models.

Hardware Specifications: ChatGPT, fine-tuned from GPT-3.5, boasts around 175 billion parameters, requiring immense computational resources for training and inference. NVIDIA’s V100 and A100 GPUs have played pivotal roles in accelerating AI workloads, with the A100 offering significant improvements in tensor performance.

Megatron-Turing NLG: Nvidia and Microsoft’s collaboration led to the development of a supercomputer for training the Megatron-Turing NLG, a neural network with 530 billion parameters. While the exact hardware used for ChatGPT’s inference is undisclosed, hypotheses suggest it may utilize Nvidia’s DGX or HGX A100 servers.

Cost Analysis: Semianalysis estimates ChatGPT’s daily hardware costs at $694,936, requiring thousands of A100 servers and GPUs to meet demand. Despite the significant expenses, OpenAI’s provision of ChatGPT for free underscores its commitment to advancing AI accessibility.

Sources:

https://www.youtube.com/watch?v=4q9-yf1eU8c

https://www.youtube.com/watch?v=-4Oso9-9KTQ

https://www.youtube.com/watch?v=Rk3nTUfRZmo

https://www.ciocoverage.com/openais-chatgpt-reportedly-costs-100000-a-day-to-run/#:~:text=Since%20the%20latest%20version%2C%20ChatGPT,15%2D20%20words%20per%20second.

https://hackernoon.com/a-deep-dive-into-how-many-gpus-it-takes-to-run-chatgpt

https://www.semianalysis.com/p/the-inference-cost-of-search-disruption

My name is Krish, I’m a 17-year-old high school student with a passion for automated vehicles and AI. If you have any questions, suggestions, or comments, I would love to hear them! You can reach out to me on LinkedIn or Twitter, and make sure to follow my newsletter to stay updated with what I’m working on. Thank you for taking the time to read my article and I hope you learned something new!

--

--