How Generative A.I. Works

14 min readJan 12, 2024

One of the biggest things to happen in 2023 was the boom in generative artificial intelligence (generative AI or GenAI). Arguably, this all started with the release of ChatGPT by OpenAI on November 30th, 2022. By January 2023 ChatGPT became one of the fastest growing software in history, gaining over 100 million users. Since then, there’s been increasing talk around Industry 4.0, a surge of new GenAI products and tools, and organisations across the globe racing to adopt it.

With artificial intelligence (AI) becoming such a big part of our everyday lives it seems only right to talk about it and really get under the hood of what it is. To do that we’re going to start with by clarifying the difference between AI and GenAI…yes, they are slightly different things. We’re then going to go into a short history lesson on how AI has evolved over time and then explore the how generative AI works and what possibilities it could bring.

The Difference Between Artificial Intelligence and Generative AI

To start off we need to understand that artificial intelligence is an area of study. If we look at the textbook definition of artificial intelligence it says:

“The theory and development of computer systems able to perform tasks normally requiring human intelligence, such as visual perception, speech recognition, decision-making, and translation between language.”

From this definition we can understand that artificial intelligence is the study of developing ways for computers and programs to be as smart as people. There are also four levels of artificial intelligence, which include; reactive machines, limited memory machines, theory of mind and self-awareness. You can likely tell that as you pass through each level the artificial intelligence comes closer and closer to human intelligence.

Reactive machines are the most basic as they perform tasks that are based on a set of pre-defined rules. Limited memory machines are a bit more advanced because they can take things a step further and use past experiences to inform future decisions. Theory of mind is the next level on, at which point machines are able to understand the emotions, beliefs, needs and thought processes of others in their responses. The final level is self-awareness, which as the name suggests, is the point at which machines have their own consciousness, self-awareness and sentience.

Needless to say, the study of artificial intelligence has yet to evolve to theory of mind or self-awareness. At the moment, artificial intelligence has only really evolved to level two — limited memory machines. Things like generative AI would fall within this level.

At this point we should consider the definition of generative AI, which refers to:

“Models that can generate high quality text, images and other content based on the data they were trained on.”

The key distinguishing feature of generative AI is in its name. Generative, meaning that it is able to create something new. But how does it do this? Well, in order to answer that question we need to go back to how the study of artificial intelligence has evolved and the sub-fields that have arisen over the years.

The Origin of Artificial Intelligence

Considering how futuristic the concept of AI seems you could be forgiven for thinking that it is a very new thing. The truth is that it isn’t. While it might not be as old as Math or English, to say that it’s only existed since the 2020’s would be a misnomer. The truth is AI has been around and you have been using (maybe even relying on) it for a lot longer than you realise.

Isaac Asimov, photo by Phillip Leonian [1] from New York World-Telegram & Sun.[2], Public domain, via Wikimedia Commons

There is an argument to say that artificial intelligence was first explored as a concept in September 1940. This is when author Isaac Asimov introduced the idea of the “positronic brain” that provides robots with a consciousness like humans. Regardless, credit for coining the term ‘artificial intelligence’ goes to John McCarthy, who first used it in a proposal he wrote with several others in the Summer of 1956. In that proposal he and his colleagues suggested that they would attempt “to find how to make machines use language, form abstractions and concepts, solve kinds of problems now reserved for humans, and improve themselves”.

In the 68 years that followed this proposal, artificial intelligence has continued to advance rapidly from science fiction to reality. In the 1960s MIT professor Joseph Weizenbaum created the first chatbot ELIZA and had it pose as a psychotherapist. In the 70s a Stanford project developed an AI tool that first demonstrated how a computer program could perform complex reasoning tasks.

In the 80s engineer Ernst Dickmanns fitted a Mercedes van with two cameras and computers to drive itself along 20 kilometres of a German highway, at more than 55 mph. On and on, AI has continued to evolve to the extent that we now use it to tell us whether the emails we receive are spam or not, to translate the words we speak into different languages, to get directions to our desired destination and even suggestions for the next word we should in our WhatsApp messages.

The Fields within Artificial Intelligence

Now that we have explored the difference between AI and GenAI as well as at how it first arose as an area of study we can begin to explore the fields within it that make the kinds of applications we see today possible. Most institutions will suggest that there are between 6–12 fields of AI. This can include things such as machine learning, computer vision, robotics, cognitive computing, data science and big data analytics as well as AI ethics and policy. There isn’t a general consensus on this, particularly as AI continues to mature and evolve as a subject. It’s also important to remember that these fields are not mutually exclusive, there can be areas of overlap.

For our purposes we are focussing on the two fields most responsible for the advent of generative AI. That means we will be focussing on machine learning and natural language processing.

Machine Learning

The textbook definition suggests that machine learning is:

“The use and development of computer systems that are able to learn and adapt without following explicit instructions, by using algorithms and statistical models to analyse and draw inferences from patterns in data.”

In simple terms, machine learning is when we develop models or algorithms that allow machines to learn from and make predictions or decisions based on data. Machine learning can be broken down into several different types, which includes; supervised learning, semi-supervised learning, unsupervised learning, reinforcement learning and more recently deep learning. It is possible to distinguish between the different types of machine learning based off the kind of data that is being put into a model and the way in which the data is being processed.

For example, supervised learning involves using labelled data (i.e. that includes tags or ‘labels’ providing information about its characteristics) to teach a machine to answer a question to which the correct output is known. A model learns by comparing its answer to the question with the correct output and then making adjustments to itself so that it can come to the right answer.

A great example of supervised learning is spam detection. Every time you check your emails and find that some emails have been automatically shifted to your spam folder this is because a machine learning algorithm has been trained to identify certain traits that determine whether an email should be junked.

The opposite of supervised learning is unsupervised learning. This is where unlabelled data (i.e. data with no information on features or characteristics provided) is given to a machine which then examines and organises the data in order to discover underlying structures or patterns. Unsupervised learning is something that is frequently used by large companies that sell specific products or services. These companies will build a large database of customer information and then use unsupervised learning to find patters on similarities among customers. These patterns could include purchasing behaviours, demographics, or preferences. Using this information they are then able to design well-tailored marketing strategies or ad-campaigns for their consumer base.

A more advanced form of machine learning is reinforcement learning. This is when a model is taught optimal actions through trial and error in an environment, using feedback in the form of rewards and penalties. In simple terms, reinforcement learning is when we set a framework for the machine to train itself. That framework will set out the actions the machine can take, the rules it must adhere to when taking these actions as well as what makes up a good outcome and a bad outcome resulting from these actions.

This is actually how self-driving cars work. They are given a set of actions such as accelerating and braking as well as a set of rules such as the car’s speed and distance from other objects. The AI controlling the car is also told that reaching a destination safely and efficiently is a good outcome whereas unsafe behaviours or traffic violations are bad outcomes. Using this information, the AI then learns the best actions to take in various driving scenarios.

The most recent and rapidly evolving area of machine learning is that of deep learning and neural networks. Deep learning is defined as:

“A type of machine learning based on artificial neural networks in which multiples layers of processing are used to extract progressively higher level features from data.”

In very simple terms deep learning involves developing models or algorithms that imitate the structure of the human brain and how it processes information. Deep learning models can work with unstructured data in its raw form (e.g. text or images), and they can automatically determine the set of features which distinguish different categories of data from one another. In the same way that the human brain has neurons that exchange information with one another so that we can process data and take actions, neural networks are made up of artificial neurons, or nodes, that each individually process different bits of information.

A common use case of deep learning models is facial recognition software. If you consider your iPhone as an example, when you use Face ID your phone is examining different features across your face and cross-referencing this with a database of facial information in order to determine whether you are in fact the user of that phone.

Machine learning, in all its forms, is an essential component of what makes generative AI possible. These models and algorithms are used to develop the ‘brain’ of the generative AI tool that you work with. Alongside machine learning another prominent field within AI is natural language processing.

Natural Language Processing

The textbook definition for natural language processing states that is:

“The analysis of computational techniques to the analysis and synthesis of natural language and speech.”

In simple terms it is a way for us to train machines so that they are able to understand text and the spoken word in the same way that humans can. Natural language processing is an advanced area of AI study that involves a combination of computational linguistics (i.e. rule-based modelling human language) as well as statistical, machine learning and deep learning models.

The computational linguistics element of this field uses complex analytical methods that help machines understand conversational human language. Machine learning and deep learning algorithms are also used to teach machines how to understand features of human speech such as sarcasm, metaphors and sentence structures as well as recognising complex patterns that might exist in speech.

Photo by Howard Bouchevereau on Unsplash

While natural language processing might appear to be a far-removed concept it is actually something that most people engage with in their daily lives. For example, Google Translate is a widely used natural language processing tool. Similarly, virtual assistants such as Apple’s Siri or Amazon’s Alexa will use natural language processing to recognise voice commands and generate appropriate responses or actions.

The way in which these models are able to do so is really straightforward. When a natural language processing model is being developed it first needs to be ‘trained’. In order to train the model ‘pre-processed’ data is required; this involves taking data — such as sentences — and breaking them down into individual units of words or phrases. Words are then simplified into their ‘root forms’. This means that a word like ‘starting’ is converted to ‘start’ so that the model can better recognise it. Next, any words that do not add meaning to a sentence such as ‘for’ or ‘with’ are removed. Once these actions have been taken the data has been pre-processed. It can now be used to train the model.

This training takes place by using machine learning and through this the model is taught to perform specific tasks when providing certain textual information. Once the model has been trained and tested it is ready to be deployed into the website or device that it has been prepared for. It will then receive inputs from users of that environment and provide outputs for the use it was designed for.

Natural language processing is a key component of developing generative AI models. In all cases, it is required to help machines understand and interpret human prompts and depending on the type of model being developed it will also be used to help the AI generate responses to humans.

Building Generative AI

We’ve now covered the origin of AI as an area of study and core fields within it that make developing generative AI tools possible. We can now consider how these things come together to create tools like ChatGPT. At its core a generative AI model is one that uses very large quantities of raw data to ‘learn’ and then generates statistically probable outputs when ‘prompted’ with inputs.

What this means is that ChatGPT and other similar generative AI models are affectively probability models. They begin by learning patterns and associations from data, which they use to generate new, similar data. They do this by analysing the words that you feed into them and then using probability to determine what the most appropriate combination of words, images, audio, video or sound to respond to your input might be. To achieve this GenAI tools use machine learning and deep learning as part of their training as well as natural language processing to undertake tasks like text generation.

This is a rapidly growing field and there are many different types of generative AI models, however, there are five specific generative AI models that are the most used. These include generative adversarial networks that are often used to create realistic images and videos as well as variational autoencoders which are also used to generate texts and images. These models use neural networks to effectively create two sides of a brain which create a feedback loop. These two sides are known as the ‘generator’ and ‘discriminator’ or ‘encoder’ and ‘decoder’ depending on the model.

In a generative adversarial network a generator will generate images that a discriminator will challenge until the machine is content that an image that has been generated is realistic. This of an artist that produces a painting and then looks for feedback from a friend or critic. The artist may then make changes to the image based off that feedback.

In the case of variational autoencoders the encoder will gather data and convert it into a simpler form that can be stored. When prompted the decoder will then reconstruct that data from where it has been stored and present it back in its original form. Where a new image is being generated the decoder will usually introduce some variability or randomness to the data to generate a unique image. As an example, think of a chef that spends years learning different recipes and how to use different ingredients, who then produces a new recipe when given a set of ingredients.

The next generative AI model we will consider is known as the transformer model. This type of model was actually first introduced in 2017 when a team at Google Brain developed an approach to determine the weight placed on words in a sentence. These models use deep learning to predict new text based on sequential data (i.e. where data is arranged in sequences where order matters). They are used in natural language processing to consider entire sentences and the emphasis placed within certain sections or words within a sentence.

This model effectively revolutionised generative AI because prior to its development models could only consider words in silo and could not learn the meaning we derive from how we emphasise certain words within human speech. For this very reason, Google went on to replace their existing AI model for Google Translate with a transformer model; noting that it generated significantly better outputs.

Google Inc., Public domain, via Wikimedia Commons

Similar to transformer models, flow-based models are also used to consider sequential data. They are most often utilised in image or audio generation as they take complex distributions (i.e. where data points have intricate and complex relationships) and convert them into simple distributions (i.e. with more straightforward and linear patterns) or vice versa.

In simple terms, this means that flow-based models can transform data, with a way that can be reversed and through which no data is lost. In order to do this flow-based models are first show complex data which it then learns to transform. It also learns to reverse the transformation at the same time. When the model is prompted by an output it takes the learned transformation, reverses it and applies it to a simple pattern to generate new content.

The final model we will consider is known as the recurrent neural network. This is an artificial neural network (i.e. brain) used to process and generate sequential or time series data, like a sentence. Recurrent neural networks are often used for ordinal or temporal problems such as language translation, natural language processing, speech recognition and image captioning. The key distinguishing feature of these models is their ‘memory’ as they take information from priori inputs to influence the current input and output. These are most often used in applications like Apple’s Siri to help with speech recognition.

Re-cap and Summary

We’ve now covered all the basics of how generative AI works. If we were to summarise this, we would say that artificial intelligence is the study of creating machines that can replicate human intelligence. The concept of artificial intelligence started in the 1940’s as the subject of science-fiction and evolved into an area of study in the late 1950’s. Since then, sub-fields of artificial intelligence such as machine learning and natural language processing have emerged that have made it possible for us to create generative AI models. These models are unique because they can generate new information, ideas and concepts.

GenAI models rely heavily on vast amounts of information that they need to learn. They learn by using artificial brains that are created using models and algorithms that are based off how the human brain processes information. These models rely on statistics and probability, which they use to formulate responses to statements or questions that you put before them. They actually form a big portion of our everyday lives and are linked to various forms of technology we use on a day-to-day basis, such as our phones, cars and laptops.

How Generative A.I. Works

Re-cap and Summary

Written by Data Decoder