ChatGPT Demystified: Behind the Scenes of a Conversational AI (Pt.1)

Published in

LegalTech Chronicles

7 min readMar 5, 2024

“Cartoon of a diverse group of Humans talking to an Artificial Intelligence” — by DALL·E 3

If you’ve skimmed through any amount of tech-related news lately then chances are you are familiar with the phenomenon that is ChatGPT.

Maybe you’ve had some fun with the public website, asked a number of deep or otherwise completely silly questions; and maybe you’ve even gotten some surprising answers back. Nevertheless, you may still feel that the inner-workings remain somewhat elusive, and perhaps you’re finding yourself eager to delve deep and unravel some of that AI mystery.

If this sounds familiar, then you’ve come to the right place. So grab a drink, get confortable and prepare for a rundown of everything ChatGPT.

The following is the first post of a multi-part series about ChatGPT. As the other parts are progressively put out, you’ll be able to find them below:

ChatGPT Demystified: Behind the Scenes of a Conversational AI (Pt.1)
ChatGPT Demystified: Behind the Scenes of a Conversational AI (Pt.2)
ChatGPT Demystified: Behind the Scenes of a Conversational AI (Pt.3)

So without further ado, let’s get started!

Understanding ChatGPT requires us to first grapple with its intelligent nature. Throughout this post, we will gradually build towards a clearer understanding of how human-like behaviour can be achieved by a computer software; and in fact, we will come to realise that this is not a sudden breakthrough but rather the culmination of a pursuit spanning decades — a meticulous journey towards creating something astounding.

ChatGPT: Artificial Intelligence

Artificial Intelligence (or AI) is a broad term that is used whenever an intelligent behaviour is exhibited by a non-human entity, specifically machines and computers. In this sense, defining “what is an AI” is as complex as defining intelligence itself; which can be seen as a general attribute encompassing diverse phenomena from the ability to reason all the way up to having and expressing emotions.

Computer Softwares Displaying Intelligent Behaviours

How can a computer software be “Intelligent”?

In practice, there are many ways an algorithm can achieve the “AI” status. One way that dates back to the early days of the field consists in using pre-defined rules. These rule-based systems would have access to a large amount of prior information on a specific topic (e.g. a knowledge base) and could use reasoning algorithms to effectively make decisions in a well-defined scope. And for the uninitiated, these systems may seem quite intelligent as they are indeed able to reason and even resolve conflicts in these specific sets of contexts.

This is ELIZA, a ChatBot built in the 60s that could play the role of a therapist. (source: Wikipedia)

While rule-based systems can be very effective, especially when a clear path to the solution is already known, this approach is often quite lacking when it comes to figuring out the solution on the go or quickly adapting to new scenarios. Real-life situations, however, are often complex enough that we can only formulate the desired outcome and do not have access to a step-by-step process for reaching the solution.

To address more advanced tasks, a different approach is used; one that has more to do with showing the system what we want and less with telling it how to get to it. This approach is called: Machine Learning.

Smarter Machines Learning Through Examples

Nowadays, many Artificial Intelligence systems fall under the umbrella of Machine Learning (or ML, for short).

In Machine Learning, the expert does not provide any explicit rules but rather curates a number of relevant examples (dataset), defines a way in which to interpret the data (model and features) and provides the system with a general strategy for learning from those examples (optimisation) in order to reach a suitable solution.

Thanks to Machine Learning, a set of identical models can follow the same optimisation strategy but only use different datasets to produce good solutions to widely different problems. For instance, the same ML model (e.g. a logistic classifier) can be trained on examples of spam and non-spam e-mails, leading to a spam detector; then, this same model can be trained on different examples, this time of positive and negative reviews, leading to a sentiment analysis system.

Some Machine Learning models aim to find abstract lines that separate the different categories in their dataset of examples. Any new items are then compared and automatically categorised accordingly.

However, while Machine Learning is more flexible, letting the models figure out solutions autonomously and providing only examples of the desired outcome; it still suffers from similar downsides as previously mentioned rule-based methods. Namely, the models can only see the data through the lens of hand-crafted features (e.g. whether an image contains shapes ressembling ears, or maybe a snout, when detecting cats or dogs). Thus, these features need to be relevant enough for the task at hand to allow for a suitable solution — just like rules in rule-based systems. Naturally, when more complex problems arise — perhaps problems for which it is unclear how the model is supposed to process the data— ML practitioners inevitably provide poor features leading, in turn, to unreliable systems.

NOTE: the involvement in the selection of data features might seem somewhat constraining at first, and, to some extent, it is. However, this has historically been a way to cope with the inherent limitations of previous generations of models, which, at the time, were unable to handle intricate data forms such as images, texts, and sounds.
In simpler cases such as tabular data or spreadsheets, ML practitioners can effectively use all available variables as features. Yet, when dealing with more complex data types, the reliance on — perhaps biased, and generally insufficient— hand-crafted features becomes a necessary proxy that ultimately enables the model to start learning.

Holistic Systems Addressing Complex Problems

ChatGPT is not just a Machine Learning model; it is in fact an application of what is called “Deep Learning”.

Deep Learning focuses on a class of models called Neural Networks that were originally designed to emulate biological neurons. This involves:

transforming input data into raw signals that are comprehensible to a computer (e.g. RGB pixel values when dealing with images);
guiding these signals through various layers of artificial “neurons”, with each layer fine-tuning and modifying distinct facets of the input, either amplifying or diminishing the signals;
ultimately, the final layer of neurons consolidates all incoming information, generating a conclusive output presented to the user.

NOTE: change the image below with one with better definition.

Evolution of a neural network’s features throughout the various layers. Each row of images shows the features that are detected by the same layer. The rows above feed their features to the rows below (source: distill.pub).

If one were to be interested, for instance, in recognizing vehicle IDs from photos of license plates; stacking many layers of artificial neurons like pancakes into a “deep” neural network often is a very effective approach.

In fact, instead of painstakingly specifying features like the various objects appearing in the image’s shape, color or size; neural networks only require raw instances of licence plates to start learning their own set of features. Specifically, the very first layers may first compare every pixel to its neighbours — is it brighter? More colorful? The next layers then may use this information to detect basic features such as edges and simple curves. These low-level attributes, once combined in subsequent layers, may start forming more complex shapes, like loops and strokes. And eventually, the final layers can integrate these shapes into abstract yet comprehensive digit representation, effectively enabling the model to detect license IDs.

In short, deep learning essentially cares about composing signals. It first ingests the raw data, unaltered, then processes it into progressively more intricate features. Fundamentally, this is a way to leverage the advanced computational capabilities of machines to explore larger sets of solutions that may not immediately be evident to us humans.

How Does ChatGPT Fit into This Picture?

Understanding a bit more about Aritificial Intelligence, Machine Learning and specifically Deep Learning, allows us to chip away at the complexity of defining what ChatGPT is. By being more comfortable with these ideas, we can now readily embrace that:

ChatGPT is a complex system; a system that is essentially based on highly capable algorithms — neural networks.

ChatGPT’s neural network has been meticulously crafted to generate appropriate answers from input text queries. This was achieved through various processes including Language Modeling and special type of Reinforcement Learning. Practical details of the model’s implementation aside, ChatGPT is also a system that was able to benefit from contemporary innovations that enabled training large models at scale: faster hardware and neural architectures, readily available datasets of text, etc.

In the next post, we will explore the remaining facets of ChatGPT, deepening our understanding of the topic and getting closer to a general understanding of this technology. Unlike this preliminary post which focused more on providing foundational tools for further exploration, the next one will delve deeper into more specific aspects of ChatGPT that set it apart from all previous attempts at creating reliable conversational AIs.

So if that sounds like fun and you would like to unveil some of that tech behind the talk, make sure to subscribe and stay tuned for the next one!