Introduction to Generative AI

Generative AI Learning Path : Lecture 1

11 min readJun 10, 2023

This is an introductory-level course explaining what Generative AI is and how it is used, how it differs from traditional machine learning methods. It also touches on some Google tools that help you develop your Gen AI apps.

In this course you’ll learn how to:

Define Generative AI
Explain how Generative AI works
Describe Generative AI model types
Describe Generative Applications

Generative AI is a type of artificial intelligence technology that can produce various types of content including text, imagery, audio and synthetic data.

What is Artificial Intelligence? How is it different from Machine Learning?

AI is a discipline like Physics for example. AI is a branch of Computer Science that deals with the creation of intelligent agents which are systems that can reason and learn and act autonomously. Essentially AI has to do with the theory and methods to build machines that think and act like humans.

Machine Learning which is a subfield of AI is a program or system that trains a model from input data. The trained model can make useful predictions from new or never before seen data drawn from the same one used to train the model. Machine Learning gives the computer the ability to learn without explicit programming. Two of the most common classes of machine learning models are

Unsupervised Machine Learning
Supervised Machine Learning

The key difference between the two is that with Supervised models we have labels. Labelled data is data that comes with a tag like a name, a type or a number. Unlabeled data is data that comes with no such tag.

Example of Supervised Machine Learning Model (image taken from Google AI course)

This graph is an example of the sort of problem that a Supervised model might try to solve. For Example: Let’s say you are the owner of a restaurant, you have historical data on the bill amount and how much different people tipped based on order type, whether it was picked up or delivered. In supervised learning the model learns from past examples to predict future values, in this case tips. So here the model uses the total bill amount to predict the future tip amount based on whether an order was picked up or delivered.

Example of an Unsupervised Machine Learning model. (image taken from Google AI course)

This is an example of the sort of problem that an Unsupervised model might try to solve. So here you want to look at tenure and income and then group or cluster employees to see whether someone is on The Fast Track. Unsupervised problems are all about discovery, about looking at the raw data and seeing if it naturally falls into groups.

Let’s get a little deeper and show this graphically as understanding these concepts is the foundation for our understanding of Generative AI.

Side-by-Side Comparison of Supervised and Unsupervised Machine Learning

In Supervised learning testing data values or X are input into the model, and the model outputs a prediction and compares that prediction to the training data used to train the model. If the predicted test data values and actual training data values are far apart that’s called an error and the model tries to reduce this error until the predicted and actual values are closer together. This is a classic optimization problem.

What is Deep Learning?

While Machine Learning is a broad field that encompasses many different techniques, Deep Learning is a type of Machine Learning that uses artificial neural networks allowing them to process more complex patterns than machine learning.

Artificial Neural Networks are inspired by the human brain. They are made up of many interconnected nodes or neurons that can learn to perform tasks by processing data and making predictions. Deep Learning models typically have many layers of neurons, which allows them to learn more complex patterns than traditional machine learning models. Neural Networks can use both labels and unlabeled data. This is called semi-supervised learning. In this, a Neural Network is trained on a small set of labelled data and a large amount of unlabeled data. The label data helps the neural network to learn the basic concepts of the task while the unlabeled data helps the neural network to generalize to new examples.

Generative AI is a subset of Deep Learning which means it uses Artificial Neural Networks that can process both labelled and unlabelled data using supervised, unsupervised and semi-supervised methods.

Large Language Models are also a subset of Deep Learning.

LLMs (like ChatGPT) are a subset of Deep Learning.

Deep Learning Models or Machine Learning Models in general can be divided into two types:

Generative:

Generates new data that is similar to the data it was trained on.
Understands the distribution of data and how likely a given example is.
Predict the next word in the sequence.

2. Discriminative:

Used to classify or predict
Typically trained on a dataset of labelled data.
Learns the relationship between the feature of the data points and the labels.

In more detail:

A Discriminative model is a type of model that is used to classify or predict labels for data points. Discriminative models are typically trained on a data set of labelled data points, and they learn the relationship between the features of the data points and the labels. Once a Discriminative model is trained it can be used to predict the label for new data points. A Generative model generates new data instances based on a learned probability distribution of existing data thus Generative models generate new content.

Take this example: Here the Discriminative Model learns the conditional probability distribution or the probability of Y (output), given X (input) that this is a dog, and classifies it as a dog and not a cat. The Generative Model learns the Joint probability distribution or the probability of X and Y and predicts the conditional probability that this is a dog and can then generate a picture of a dog.

So to summarize Generative models can generate new data instances while Discriminative models discriminate between different kinds of data instances.

The a) top image shows a traditional machine learning model which attempts to learn the relationship between the data and the label or what we want to predict, b) the bottom image shows a generative AI model which attempts to learn patterns on content so that it can generate new content.

A good way to distinguish what is Generative AI and what is not is shown in this illustration:

It is not Generative AI when the output or Y is a number or a class, For Example, spam or not spam or a probability. It is Generative AI when the output is natural language like speech or text an image or audio.

For Example: Visualizing this mathematically would look like this:

y is equal to f(x) equation calculates the dependent output of a process, given different inputs. The y stands for the model output the f embodies the function used in the calculation and the x represents the input or inputs used for the formula. So the model output is a function of all the inputs. If y is a number like predicted sales it is not Generative AI. If y is a sentence like, Define sales? it is generative as the question would elicit a text response, and the response would be based on all the massive large data the model was already trained on.

The traditional classical supervised and unsupervised learning process takes training code and label data to build a model, depending on the use case or problem the model can give you a prediction, it can classify something or cluster something.

Generative AI Supervised, Semi-Supervised and Unsupervised Learning. The Generative AI process can take training code, label data and unlabeled data of all data types and build a foundation model. The foundation model can then generate new content for example text, code, images, audio, video etc.

We’ve come a long way from traditional programming to neural networks to generative models.

In traditional programming, we used to have to hard code the rules for distinguishing a cat, like a type: animal, legs: 4, ears: 2, fur: yes etc.

In the wave of Neural Networks we could give the network pictures of cats and dogs and ask: Is this a cat? and it would predict a cat.

In the Generative wave we as users can generate our own content, whether it be text, images, audio, video etc. For Example, models like PaLm(Pathways language model) or LaMDA(Language model for dialogue applications), ingest very very large data from multiple sources across the internet and build foundational language models. We can use simply test it by asking a question whether by typing it into a prompt or verbally talking into the prompt itself. So when you ask it What’s a cat?, it can give you everything it has learned about a cat.

What is Generative AI?

Generative AI is a type of Artificial Intelligence that creates new content based on what it has learned from existing content. The process of learning from existing content is called training and results in the creation of a statistical model. When given a prompt Generative AI uses the model to predict what an expected response might be and this generates new content.

Essentially it learns the underlying structure of the data and can then generate new samples that are similar to the data it was trained on. As previously mentioned a generative language model can take what it has learned from the examples it’s been shown and create something entirely new based on that information. Large Language Models are one type of generative AI since they generate novel combinations of text in the form of natural-sounding language.

A generative image model takes an image as input and can output text, another image or video. For example, under the output text you can get visual question answers, while under the output image, an image completion is generated and under the output video, the animation is generated.

A generative language model takes text as input and can output more text and images, audio or decisions. For example, under the output text question, an answer is generated and under the output image, a video is generated.

We’ve stated that generative language models learn about patterns in language through training data, and then given some text they predict what comes next. Thus generative language models are pattern-matching systems. They learn about patterns based on the data you provide.

The power of Generative AI comes from the use of Transformers. Transformers produced a 2018 revolution in natural language processing. At a high level, a Transformer model consists of an encoder and a decoder. The encoder encodes the input sequence and passes it to the decoder which learns how to decode the representation for a relevant task.

In Transformers, Hallucinations are words or phrases that are generated by the model that are often nonsensical or grammatically incorrect. Hallucinations can be caused by a number of factors including

the model is not trained on enough data,
the model is trained on noisy or dirty data
the model is not given enough context,
the model is not given enough constraints.

Hallucinations can be a problem for Transformers because they can make the output text difficult to understand. They can also make the model more likely to generate incorrect or misleading information.

A Prompt is a short piece of text that is given to the large language model as input and it can be used to control the output of the model in a variety of ways. Prompt designing is the process of creating a prompt that will generate the desired output from a large language model.

Generative AI depends a lot on the training data that you have fed into. It analyzes the patterns and structures of the input data and thus learns, but with access to a browser-based prompt you the user can generate your own content.

On the types of input based upon data, there are associated model types:

1. Text to Text

Text-to-Text models take a natural language input and produce text output. These models are trained to learn the mapping between a pair of texts. For Example, translation from one language to another.

2. Text to Image

Text-to-Image models are trained on a large set of images each captioned with a short text description. Diffusion is one method used to achieve this.

3. Text to Video

Text to Video models aims to generate a video representation from text input. The input text can be anything from a single sentence to a full script and the output is a video that corresponds to the input text. Similarly, text-to-3D models generate three-dimensional objects that correspond to a user’s text description, for example, this can be used in games or other 3D worlds.

4. Text to Task

Text to Task models is trained to perform a defined task or action based on text input. This task can be a wide range of actions such as answering a question, performing a search, making a prediction, or taking some sort of action. For Example, a text-to-task model could be trained to navigate a web UI or make changes to a doc.

A foundation model is a large AI model pre-trained on a vast quantity of data designed to be adapted or fine-tuned to a wide range of downstream tasks, such as sentiment analysis, image captioning, and object recognition. Foundation models have the potential to revolutionize many industries including healthcare, finance and customer service. They can be used to detect fraud and provide personalized customer support.

Congratulations!

On finishing the first part of the course in learning the Generative AI learning path.

In the next lecture of this series, we will learn about Large Language Models in detail, what they are, where they can be utilized and how you can use prompt tuning to enhance LLM performance.

Till then you can look at my other works and learn Computational Physics with Python, or Computational Linear Algebra, or maybe solve some Leetcode problems.