Generative AI — Introduction

Gurleen Sodhi
6 min readMay 9, 2024

--

Gen AI is a type of AI technology that can produce various types of content including text,imagery, audio and synthetic data.

First, let's discuss the difference between AI vs ML

  • AI is a discipline ( Like physics is a discipline of science) — It deals with creation of intelligent agents and our system can reason learn and act autonomously. In short, machine that can think like a human.
  • ML is a subfield of AI — It’s a program or system that trains a model from input data. The trained model can make useful predictions from new never-before seen data drawn from the same one used to train the model.

Little bit about Machine models

Two of the most common class are —

  1. Supervised ( Labeled Data )- data comes with tags such as name, data set or number
  2. Unsupervised ( Unlabeled Data ) -data comes without any tag.

Let’s consider a common example of a supervised learning model: predicting house prices based on features such as size, number of bedrooms, location, etc.

Here, you’d start with a dataset that includes information about houses sold previously. You’d split the dataset into two subsets : a training set and a test set. The training set would be used to train your model, while the test set would be used to evaluate its performance. During training, the model learns the relationship between the features (size, number of bedrooms, etc.) and the target variable (house price) by adjusting its internal parameters.

And hence, the model will output a predicted price based on its learned patterns from the training data.

Next, lets see the example of unsupervised model: income vs job tenure. Imagine we have a dataset containing information about individuals’ incomes and their job tenures (the amount of time they have been working at their current job). The dataset might look something like this:

We can use an unsupervised learning technique like clustering to group individuals based on similarities in their income and job tenure. One common clustering algorithm is K-means clustering.

After performing K-means clustering, we might obtain clusters like this:

Cluster 1: Individuals with relatively low incomes and short job tenures.

Cluster 2:Individuals with moderate incomes and moderate job tenures.

Cluster 3:Individuals with high incomes and relatively long job tenures.

We can then analyze each cluster to understand the characteristics of individuals within it and potentially derive insights or make decisions based on those findings.

Let’s briefly explore Deep learning

DL fits as a subset of ML. It uses artificial neural networks allowing them to process more complex patterns. Neural network can use both labeled and unlabeled data called as semi-supervised learning.

In a neural network, the information is transferred from one layer to another over connecting channels. They are called weighted channels because each of them has a value attached to it.

All neurons have a unique number called bias. This bias is added to the weighted sum of inputs reaching the neuron, to which then an activation function is applied. The result of the function determines if the neuron gets activated. Every activated neuron passes on information to the following layers. This continues up to the second last layer. The output layer in an artificial neural network is the last layer that produces outputs for the program.

Deep learning models have many neural networks that learn basic concepts of tasks using labeled data and unlabeled data helps neural network to generalise to new examples. Now, we finally get to where Gen AI fits into this AI discipline.

Gen AI uses artificial neural networks to process both labeled and unlabeled data using supervised,unsupervised and semi-supervised methods.

There are 2 categories of Gen AI- LLM and Image based

1. Large language models — Trained on very large data set and millions of params. It is also generic in nature. For example, chatGPT (openAI),Llama(Meta) . It can be used as chatbot, question answer system or any support system.

ChatGPT is a type of Generative Pre-trained Transformer which is a type of LLM (a massive computer-based representation of natural language examples) which is a type of General-purpose Transformer(an ANN language processor) which is a type of Artifical Neural Network (a ML approach inspired by how human brain works) which is a type of Machine Learning ( an approach to AI that uses algorithms to improve data performance).

Note — Any use of LLM out of the box is known as Prompt Engineering. To know more about it, visit What is Prompt Engineering?

Issues around training a text GPT

So that a text GenAI can generate text, it first has to be trained. This involves the tool being provided with and processing huge amounts of data scraped from the internet and elsewhere. It is reported, but not confirmed by OpenAI, that the training of GPT4 involved a million gigabytes of data. Processing this data involves identifying patterns, such as which words typically go together (e.g. “Happy” is often followed by “Birthday”).

Human costs

Once the text GenAI model is trained but before it is used, it is often checked and refined in a process known as Reinforcement Learning from Human Feedback (RLHF). In RLHF, text GenAI responses are reviewed and validated by human reviewers. These human reviewers ensure that the GenAI responses are appropriate, accurate, and align with the intended purpose.

How a GPT generates text

Once the GPT has been trained, generating a text response to a prompt involves the following steps:

1. The prompt is broken down into smaller units (called tokens) that are input into the GPT.

2. The GPT uses statistical patterns to predict likely words or phrases that might form a coherent response to the prompt.

  • The GPT identifies patterns of words and phrases that commonly co-occur in its prebuilt large data model (which comprises text scraped from the Internet and elsewhere).
  • Using these patterns, the GPT estimates the probability of specific words or phrases appearing in a given context.
  • Beginning with a random prediction, the GPT uses these estimated probabilities to predict the next likely word or phrase in its response.

3. The predicted words or phrases are filtered through what are known as ‘guardrails’ to remove any offensive content.

4. Steps 2 to 3 are repeated until a response is finished. The response is considered finished when it reaches a maximum token limit or meets predefined stopping criteria.

5. The response is post-processed to improve readability by applying formatting, punctuation, and other enhancements (such as beginning the response with words that a human might use, such as “Sure,” or “Certainly,” or “I’m sorry”).

2. Image based — Image GenAI and music GenAI use a different type of ANN known as Generative Adversarial Networks (GANs)

GANs have two parts (two ‘adversaries’), the ‘generator’ and the ‘discriminator’. The generator creates a random image in response to the human-written prompt, and the discriminator tries to distinguish between this generated image and real images. The generator then uses the result of the discriminator to adjust its parameters, in order to create another image.

This process is repeated, possibly thousands of times, with the generator making more and more realistic images that the discriminator is increasingly less able to distinguish from real images.

For example, a successful GAN trained on a dataset of thousands of landscape photographs might generate new but unreal images of landscapes that are almost indistinguishable from real photographs.

Prompt engineering is a relatively new discipline for developing and optimizing prompts to efficiently use language models (LMs) for a wide variety of applications and research topics. Prompt engineering skills help to better understand the capabilities and limitations of large language models (LLMs).

--

--