Understanding Generative AI

6 min readJan 19, 2024

Little Rouge Creek, Rouge National Urban Park, Toronto, ON, Jan 1 2024

It’s essentially a study note of related short courses on Deeplearning.ai plus my understandings. Highly recommend them and Andrew Ng as always!

1 Concepts

1.1 AI/ML

AI in brief is using software to imitate human behaviors. The software usually needs to learn first and mimic after.

Machine learning(ML), as foundation of AI, is how software learns patterns from given information aka datasets. Outcome of this learning is called model which is trained to act based on given inputs. Model then can be used to make judgement/actions automatically when new info comes. We have discussed how to use Snowpark to train a classification model which can predict Iris species.

ML (supervised learning, unsupervised learning, reinforcement learning) evolves from classic learning methods/algorithms (Naive Bayes, Logistic regression, KNN etc.) to more advanced algorithms (Neural Network, aka Deep Learning). So this software/AI application can mimic more advanced human behaviors: vision and language. We have discussed a simple NLP example to predict next word.

1.2 LLM

While this has been going on for several years, there was this breakthrough in 2023 — the arise of ChatGPT (Generative Pre-trained Transformer) by OpenAI.

ChatGPT is “Prompt” + “LLM”. “Prompt” is about how to feed input to backend “LLM”. LLM (“Large Language Model”) is about a model responding with “generated” information which doesn’t exist in advance.

Traditional search is to find proper “existing information” for you while LLM generates “proper” info based your requests.

LLM has two key aspects: message vectorization and large in parameters/dimensions and amount of info.

(1) LLM breaks text into tokens (usually ~3/4 of long words) and encodes each tokens into an embedding (a multiple dimensional float type vector). A sentence token of multiple tokens is calculated as an average/sum of each tokens.

(2) LLM’s parameter number is about the dimension number of an token. The more parameters the more granular this model can represent/understand a token/word/sentence. So usually we think 1B parameter small LLM is relatively less capable compared to a 10B parameter LLM. GPT-4 has 1.76Trillion parameters!

In the meanwhile, GPT3/4 LLM model is trained using huge scraped data from Internet — it’s closed model which we don’t know many details. There are open source LLMs hosts on Huggingface and some models tell what data they are trained with as well. Snowflake now has public review feature container services which can be used to deploy open source model and do inference.

By the way, In our simple NLP post mentioned above, we only had very small pre-trained bi-gram, tri-gram data in our model knowledge base.

1.3 Generative AI

The intelligent part of LLM is that when it responses to inquiries it generates messages which don’t exist in advance. LLM backed applications are also called AI driven applications.

In comparison, a regular search engine as an app can only match your ask to existing information it collected in advance.

How can LLM “create” info? It’s because LLM is trained and is inferencing using vector data. (Compared to a NaiveBayes model is trained with structured datasets.)

Information/input data is vectorized into tokens as a first step before training. Then they are fed into neural network with random weights/biases going through multiple layers to be encoded to embeddings. LLM model is then trained to put embeddings with closer semantic meanings together by learning from labeled data. When new information comes for inference, it’s embedding will be calculated using vector cosine similarity for predicted space. LLM then can re-organize tokens in this space to generate messages and return back.

So, vector databases is how LLM’s knowledge base is stored.

However, a general LLM model like GPT4, Llama2 can only be used to predict next words essentially — no magic. Following methods are used to really shift LLM’s responses from “predicting next words” to “be specific to what’s asked”.

These methods are prompt, Retrieval Augmented Generation(RAG), fine-tuning and Reinforced Learning from Human Feedback (RLHF). This area keeps developing.

1.3.1 Prompt

Prompting LLM model is about the style to construct input messages to LLM — how to be clear and specific so LLM can understand the question and provide accurate responses.

See below 2.2 for one tactic of making prompt. LangChain libraries can be used to help prompt LLM models.

Github copilot tool is another example of prompt crafting. Copilot takes your coding context, construct to clear prompt that instructs GPT4 to respond with relevant suggestions.

So, prompt is not adding any new information to existing LLM.

1.3.2 Retrieval Augmented Generation (RAG)

RAG uses additional data outside LLM to prompt the LLM so it can give responses specifically related to the additional information.

Before chatting with LLM, RAG supported tool provides a way to upload custom data files to associate with LLM. The tool then search question related info from custom data files and construct an enhanced prompt. With this prompt, LLM is able to answer with custom related information.

See below 2.3 for more explanation.

1.3.3 Fine-tuning

When custom data files are large usually specific domain knowledge like medical notes, legal documents, financial documents etc. they can not be put in prompt. RAG is not the best way.

Or when the task is hard to be described in prompt, for example to ask LLM to mimic how Andrew Ng talks. We’ll want to fine-tune LLM with custom data — like Andrew’s lesson scripts!

Note, ChatGPT is fine-tuned from GPT4 with instruction data (questions/answers list) to mimic chatting style!

1.3.4 Reinforced Learning from Human Feedback (RLHF)

To discuss in another post.

2 Practice env setup and key take aways from some of the courses

2.1 Our coding env

We can just use notebook hosted by Deeplearning.ai. But we can also create local Python venv, setup for Jupyter notebook OR use Colab.

2.1.1 Local env

PS C:\Users\feng\workspace> python3.10 -m venv openai-venv
PS C:\Users\feng\workspace> .\openai-venv\Scripts\activate
(openai-venv) PS C:\Users\feng\workspace> pip install openai

2.1.2 Colab notebook

Just login to your Google Colab, start a new notebook and install dependent libraries.

!pip install openai
!pip install tiktoken
!pip install python-dotenv
!echo "OPENAI_API_KEY=xxxx" > .env

2.2 Prompt example to clarify the question

Following screenshot is from course ChatGPT Prompt Engineering for Developers.

In prompt, we tell LLM model the messages to be summarized is enclosed by ```. In this case, LLM knows what are the messages need to summarize for example, it won’t possibly treat them as instructions in the prompt.

https://learn.deeplearning.ai/chatgpt-prompt-eng/lesson/2/guidelines

2.3 RAG explanation example

Screenshot from course Generative AI for everyone.

https://www.deeplearning.ai/courses/generative-ai-for-everyone/

2.4 Fine tuning

Screenshot from course Generative AI for everyone.

3 Existing Tools/Platforms that supports RAG, Fine-tuning etc.

Major Cloud platforms like AWS, Azure, GCP and Snowflake have developed tools that do prompt, RAG and Fine tuning etc. So we don’t have to do it from scratch.

In next posts let’s see GenAI capabilities in these Platforms.

References:

https://learn.deeplearning.ai/chatgpt-prompt-eng/lesson/2/guidelines

https://www.deeplearning.ai/courses/generative-ai-for-everyone/

Happy Reading!