Generative AI 101 — What’s up with ChatGPT ?

Emile Languepin
Beaucoup Data
8 min readMar 5, 2023

--

You’re about to miss an opportunity as large as the internet.

Imagine having recognized early on the revolutionary potential of this technology. It would have been like being given a Ferrari to enter a sack race: you’re fast, they’re furious.

Over the past few years, Generative Artificial Intelligence breakthroughs have enabled everyone to access its capabilities.

And they are astounding: it can generate new content, partially replicate human thought processes and language, and complete complex tasks that have traditionally required human intelligence.

However, adopting GenAI comes with many challenges — from how to interact with it, to ensuring it’s creating relevant content or even new ethical debates — experts will have their hands full.

In this article, we’ll provide a down to earth introduction to these subjects that will set you up on a course to exploit this game-changing tech.

Ready ?

What is Generative AI ?

Generative AI is a type of artificial intelligence that uses machine learning to create original content, like videos, music, text, images, code and more: it can output a brand new image of a cat or a dog. Fundamentally, it uses algorithms to analyse large datasets then enabling it to generate from what it has learnt.

It contrasts with traditional use-cases solved by AI, which focused on classifying or inferring data points: for instance, a model that can recognize whether a picture is a cat or a dog. Up until 10–15 years ago the field of AI was heavily invested in those use-cases. Technological limitations were recently overcome, propelling advancement in GenAI.

While both approaches of AI are valuable, Generative AI is an exciting area of development, because of its unique abilities.

Traditional approaches, on the other hand, have more established use cases which benefit from extensive research and documentation.

Interacting with Generative AI

One of the main changes brought on by GenAI lies in the way we interact with it — which is called Prompting. While this means anyone with access to a computer and internet can now use GenAI models for their own benefit, we believe Prompting is going to become a domain in itself, as the quality of the outputs range widely based on the user’s expertise.

Prompting basically involves giving a context or starting point for the AI model to situate itself, as well as an objective or a set of requirements, while minimising unnecessary or confusing details. You can read the following article for a specific dive on prompting.

For instance, prompting a Large Language Model (LLM) like OpenAI’s ChatGPT, one could use the following prompt:

And the output:

If that doesn’t fill you with a sense of wonder, I don’t know what will!

However, a rap fan might point out that the styles of these 2 rappers are not necessarily identifiable, likely because “prompting” and “GenAI” aren’t part of either Lil Nas X or Biggie’s lyrics. Prompts like this can be iterated on, and re-fed to the same algorithm for improvement. Alternatively, you can also iterate on the output by prompting the model to add more slang, or anything you feel like the output is missing

Let’s take a second example, prompting Midjourney’s proprietary image generating model:

A portrait of a rapper.
He looks like a mix of Lil Nax x and Notorious BIG.

And the output, before any enhancements:

Midjourney will as well allow you to iterate on the output, by enhancing quality, backgrounds, facial features and more. At each step, prompting is the key to unlock the desired result.

Main types of models used in Generative AI

While there are many potential types of underlying models powering systems like ChatGPT or Midjourney, most of the recent literature about GenAI focuses on three: Generative Adversarial Networks (GAN), Generative Pre-Trained Transformers (GPT) and Variational Auto-Encoders (VAE).

You can think about them through this lens: the same way an artist uses different colours, brushes and techniques to create a painting, Data Scientists use different architectures, machine learning models and mathematical principles.

Generative Adversarial Networks (GAN)

GANs consist of two parts: a generator and a discriminator. Think of the generator as a painter who creates fake paintings, and the discriminator as an art critic who decides if the painting is real or fake.

These models are trained by the repeated competition of these two parts. The generator keeps trying to create more convincing fakes and the discriminator tries to catch them. At each iteration, both parts are getting better at their job (through a process called backpropagation — essentially, penalising each part for its mistakes and letting them adjust their behaviours, much like humans): the generator gets better at creating realistic fakes, while the discriminator gets better at identifying them. Eventually, the generator becomes so good at creating realistic data that it can fool humans into thinking it’s real.

Generative Pre-Trained Transformers (GPT)

GPTs are inherently language prediction models. They are essentially algorithms (deep learning) that will predict the most useful result based on an input text. A helpful image is that of a language tutor that has read so many books it has a deep understanding of a language structure, vocabulary and style.

These models are trained on extremely large amounts of text data (GPT-3, the GPT model used by ChatGPT, was trained using Wikipedia as one of its sources) to learn the patterns and relationships between words. This training is done in a supervised approach — meaning the model will be asked questions to which we already have the correct output — if the model answers incorrectly, its parameters will be altered until it can generate human-like text by predicting the most likely next answer based on the context it has learned from its pre-training.

Variational Auto-Encoders (VAE)

VAEs are designed by an encoder-decoder pair. The encoder’s role is to take a set of data and reduce it to its key attributes, making it unreadable by anyone but by the decoder it is paired with in the process.

During the training process, each decoded data is reviewed against the original (pre-encoding) and that information is backpropagated to the pair iteratively until the best encoding-decoding scheme is achieved.

From there, the decoder of the pair can act very much like the generator within the GAN framework; you can feed it any “fake” encoded data and it will create brand new decoded content simply based on the patterns it has learned.

Now, the next time you meet with one of your Data Scientists friends, make sure to blow their minds by asking them when the last time they used a GAN model was. Guaranteed effect!

Immediate Applications

Your first step with GenAI will most likely be to create pictures of a cat on a surfboard, or to write your next wedding speech. And we don’t blame you — we still spend a ridiculous amount of time doing exactly that.

However, we believe that GenAI will trigger groundbreaking change in how we go about working, creating and interacting with technology over the next decades. There are already many domains where smart, inquisitive individuals like you are taking advantage.

Here a couple major ones:

  • Content creation: GenAI can be leveraged to produce high-quality written, visual or audio content that is used in sectors ranging from marketing to product development or even by artists
  • Coding: Large Language Models based on the GPT architecture (like ChatGPT) are being trained to act as coding “assistants” for developers, entirely writing pieces of code for specific use-case, or correcting existing code to make it more efficient
  • Drug design: GenAI is already used to design drugs within months (vs traditional timelines of 3–6 years), offering pharma companies significant opportunities to reduce both the costs and timelines of drug discovery
  • Personalization: GenAI can be relied upon to create personalised recommendations and experiences for customers based on their data and behaviour — think human-like Chatbots
  • Synthetic data: One of the keys to AI is data — however modern data is often private or needs to be anonymized. GenAI is able to generate new data sets, mimicking the patterns of proprietary data sets, that can then be used for research and analysis.

Firms are already heavily investing (1.7B$ in the last 3 years) in the continued development of these solutions, so expect progress to be exponential!

Ethic and Generative AI

Along with the exponential usage of GenAI will come an increasing number of ethical challenges. Most of them are not new, but will take on different forms. With the below list, we aim at providing a baseline for reflection so you can make an opinion for yourself, about the future that this technology is shaping!

  • Bias: GenAI models are essentially a reflection of the data they are trained on. Hence it is critical to understand there is a human decision shaping what models learn. As such, they could perpetuate or amplify societal prejudices and inequalities. Emphasis on could; as they could also play the exact opposite role — reinforcing, given we train them with the right data. The question is: right for who ?
  • Malicious usage and manipulation: GenAI produces highly realistic fake content, such as images, videos, and text, which in turn could be leveraged for malicious purposes, such as spreading misinformation or manipulating public opinion. Basically, as long as someone has a recording of your voice and a few pictures, they can create a video of you saying anything they want.
  • Ownership and copyright: With the ease of generating new content using these models, how do we know who owns the right to use the output ? The prompter, the AI, the creator of the AI or the creators of the training data?

There are no immediate or easy answers to these questions; however a collective understanding of their existence will in our opinion help steer us towards the right path.

Conclusion

We hope this introduction article got you as excited as we are about the potential of GenerativeAI, and that you’re now eager to explore what it can do for you !

If you’re interested in learning more, stay tuned for our upcoming articles where we will explore these topics in greater depth and discuss cool practical applications.

Oh and of course, please reach out if you’d be interested in training around AI, or if you simply want your business to benefit from some new awesome data-driven strategies.

Written by Gregory Belhumeur and Emile Languepin for Beaucoup Data: Innovative, data-driven strategies to supercharge your growth, powered by Data Science & ML.

--

--

Emile Languepin
Beaucoup Data

Strategic problem-solving in tech and innovation. Foodie, traveller, fascinated by psychology and behavioral science. Sharing what I learn along the way!