ChatGPT Explained (for Non-technical Audience)

Published in

USF-Data Science

4 min readFeb 22, 2024

How does ChatGPT work?

This is a question we all probably asked in the last year since ChatGPT’s first release awed by it seemingly magical capabilities.

Last month, University of San Francisco had the honor of hosting Josh Starmer — a YouTube star in the statistics, data science, and machine learning world with over 1.1M subscribers in his StatQuest channel. Josh is well known to democratize deep technical knowledge by making them approachable through simple examples worked through, step-by-step, using pictures (and often singing original songs) to make sure every main idea is easy to follow along.

Josh and his ukulele. I can confirm he writes and sings all his songs!

Below is my humble attempt at summarizing how Large Language Models (LLMs) like ChatGPT work for a non-technical audience inspired by Josh’s approach of explaining complex concepts in a simple way.

Model training

Training dataset

First, LLMs, like ChatGPT, are fed huge training datasets from the web.

Transform words into vectors

Each vocabulary in the training dataset gets assigned a unique word embedding (vectors that represent each word). Words that are similar are represented in closer numbers.

Drawn in 2D for simplicity but LLMs typically have really high dimensional vectors, which can’t be represented visually — just try imagining something in 4D or 500D space if you don’t believe me

Maintain word order

Not only do LLMs keep track of each word’s numerical representation, they also keep track of the order of the words in a document (scary technical term: positional encoding). This helps differentiate the meaning between sentences like:

“The dinosaur ate the pizza.”

–Versus–

“The pizza ate the dinosaur”

The outcome for the dinosaur would be very unfortunate in the latter case.

Relationship between words

On top of taking care of numerical representations and word order, the model must also care about how words relate to each other.

Let’s say the large training dataset may include my YouTube comment on Josh’s latest video that says:

“I love StatQuest videos. They’re awesome at helping me understand complicated concepts!”

The model also figures out the relationship between words (scary technical term: self attention scores) for all word pairs. The model learns this through trial and error (scary technical term: backpropagation).

It learns that “awesome” has a strong relationship with “StatQuest”.

Guessing the next word

In training, the model processes the text iteratively, word-by-word, and it tries to guess what the next word will be based on numerical representation, position, and word relationship strength.

“I <blank>”

“I love <blank>”

“I love StatQuest <blank>”

…

Eventually the model will come across:

“I love StatQuest videos. They’re <blank>”

While hiding the later part of the sentence from itself, the model considers an appropriately sampled vocabulary as the potential next word.

The model knows that the next word is “awesome” and it uses the word vectors (from earlier) to calculate the error for each of the guesses — the closer the words, the smaller the error.

This is done at an enormous scale with all of its training data, and the model tries to optimize for guessing as much of the whole dataset as accurately as possible.

Model in action

Now that the model is trained, consider this simple example of an exchange between a ChatGPT and me:

In short, ChatGPT generates the output by guessing the next word like it did when it was training. In other words, if it sees many comments saying how awesome StatQuest is, it will associate those words more closely together and have a higher probability of saying it’s awesome rather than saying, “StatQuest is edible.”

That is how ChatGPT generates its answer “Awesome!!!”. It simply says, there is a high probability based on what it saw during training (example: YouTube comments) that StatQuest is awesome.

If it’s a longer answer like “Awesome YouTube channel by Josh Starmer on statistics and machine learning fundamentals”, it generates its answer, guessing one word at a time.

“What is StatQuest?” “<blank>”

“What is StatQuest?” “Awesome <blank>”

…

“What is StatQuest?” “Awesome YouTube channel by Josh Starmer on statistics and machine learning <blank>”

And I for one agree that StatQuest is awesome!

Josh goes much deeper into this topic than I do in his video Decoder-Only Transformers, ChatGPTs specific Transformer, Clearly Explained!!! so check out his awesome videos where you can get a good grasp of various machine learning concepts. BAM!