How does ChatGPT know stuff? And why is it important?

Christopher Martlew

Published in

On Being Agile

5 min readFeb 7, 2023

Why is it important?

This new application of AI is a gamechanger at the magnitude of the PC/Windows, the internet/www and the iPhone/mobile.

GPT is important because there are unlikely to be more than a handful of these massive AI engines on the planet. They are simply too big, too complex and too expensive to build and run.

A future clutch of AI titans may well form the backbone of a global AI infrastructure — in rather the same sense that network protocols, email, the web and social media defined our internet experience of the last 25 years.

So the battle is on to get into this space and bigtech will be spending billions to catch up with ChatGPT.

Both ChatGPT and Google’s Bard are based on the same Transformer (the ‘T’ of GPT) technology. The transformer architecture originated in Google’s labs (making it rather sour for them that OpenAI is now eating their lunch) and was published in a paper Attention is All You Need (Vaswani et al, 2017).

CEO of OpenAI, Sam Altman says (in an interview with Reid Hoffman) that a lot of value will be created in a new ‘middle layer’ on top of GPT. We’re already seeing a growing ecosystem of applications built on the ChatGPT foundation — some of these will be bandwagon-jumpers, others may be great value creators.

What might the new middle layer comprise? In short, almost anything a human can do: computer programming, medical diagnosis, biotech, financial advice, share price guidance, customer service chatbots (that actually work) and answering exam questions.

Microsoft will be launching numerous features this year based on its cooperation with OpenAI. Like automated meeting minutes and action lists (how awesome is that?), or dynamically generated images for PowerPoint based on OpenAI’s Dall-E (hopefully better than the image below).

The cooperation with Microsoft will almost certainly consolidate ChatGPT’s position as one of the global leaders in the field. Not to mention the involvement of OpenAI’s sponsor Elon Musk — although he has been backing off lately saying that OpenAI was started as open-source & non-profit. “Neither are still true.” Could be a fly in the ointment there — never underestimate Mr. Musk.

The connection with Microsoft is the existential threat for Google — overlay ChatGPT onto Teams, Office365, Bing and Microsoft’s global reach and Google may rightly be a little worried.

Google made its name (and became a verb) by offering a simple, clean web page with a search box in the middle. This was in stark contrast to the cacophony of other search engines with flashing, ad-covered pages. History may be repeating itself.

Image generated by DALL-E by OpenAI — Shutterstock (cover image) doesn’t need to worry just yet — or maybe…?

How does ChatGPT know stuff?

Well…health warning, often it doesn’t, and it can get things spectacularly and hilariously wrong. Notwithstanding its ability to pass MBA and other university-level exam questions.

But it can be a lot easier and more user-friendly than old-fashioned search with ads. Often 90% accurate is good enough as a starter.

GPT stands for Generative Pre-trained Transformer. Generative means it can generate reasonable human-like responses. Pre-trained means it learns only relatively little in real-time — most of what it knows is pre-trained (or pre-learned). The transformer architecture means it is pretty good at translating what it “knows” into human language, and vice versa, including language translation.

ChatGPT is a conversational interface built on an underlying engine called GPT-3. GPT-3 is a Deep Neural Network. As such, it mimics the human brain. It has ‘artificial neurons’ which create ‘artificial intelligence’. The artificial neurons are organized into layers, and each layer processes information in a similar way that neurons process information in our brains.

Neural networks are trained by passing data through learning algorithms. These algorithms switch neurons ‘on’ or ‘off’ and create links between the neurons. As with our brains, the number of neurons is not the driving factor behind intelligence, it’s the links between the neurons.

The way in which a pathway of neurons and links is formed is called a parameter.

A parameter is a weight or bias in the learning model. It’s the parameters that are adjusted (or ‘learn’) by processing large amounts of data. The more data, the more parameters and the better the engine’s performance and accuracy can be.

Reportedly, GPT-3 has 175 billion parameters. Currently, there is no other model close to that number. GPT-4 reportdeldy has over a trillion parameters and cost $100 million to train. Bard is reputed to have 137 billion parameters.

Most of GPT’s knowledge recipe comes from Common Crawl (basically, that’s almost everything on the internet). But ‘everything on the internet’ is not very high quality, so it is augmented by WebText2 based on Reddit recommendations (so curated to some extent). Then add in two sets (corpuses) of books (rather cryptically known as Books1 and Books2). Leave to simmer for a few weeks and then add in the whole of Wikipedia (in English).

Following that, the model was refined by human beings on typical questions and answers.

Maybe not surprising that the machine reflects the inputs it’s trained on and has a bias towards the English language and US/Anglo-Saxon culture.

Learning takes a while, and the current version was trained on data up to June 2021, so the model is not good on current affairs. Bard is likely to be more real-time based on Google’s underlying search and web crawl technology.

Next possible gamechanger? Look out for new entrants and a successor to the transformer protocol. Watch this space, but don’t hold your breath on the latter.

Having said that, and to paraphrase Bill Gates, we tend to overestimate the short term impact of new technology, and underestimate the long term.

If you’d like to join the conversation please share via the clapping-hands button below.

Also at:

amazon.com | amazon.co.uk | bol.com | blog

#OnBeingAgile #mindoftheorg

How does ChatGPT know stuff? And why is it important?

Why is it important?

How does ChatGPT know stuff?

Written by Christopher Martlew