Intro to LLM and RAG

Mahmoud Kamal
8 min readJul 1, 2024

--

I’ve just started a new journey learning about a new trending technology called LLM. I’m studying this concept from llm-zoomcamp course. In a series of articles I will try to write about what I’d learnt on this week to build the habit of writing my daily learning and also be as a reminder to the concepts I learnt so far.

Intro

In this module, I learnt about what is LMs, LLMs and RAG system. In the rest of this article I will try to write what I’ve practiced during this module and talking about the challenges faced on preparing the environment.

Language Models “LMs”

To understand what is LLM, lets first understand what’s LM and where we see it in our life.

LMs are algorithms that predict the next word based on the words you’ve already written, it may not care about the context of speech or the order of words. So comparing with LLM it’s considered very simple. You may see this criteria in lots of devices like on our phones. When chatting with your friend, you may notice that your device recommends the next word to write it automatically for you.

Language Model recommends three words

Large Language Models (LLMs)

They are more advanced versions of LMs, with much more number of tokens (parameters/words) and more capabilities to remember the ordering of words as well as the context on which a certain word is called.

Nowadays there’re lots of models around us maybe you used at some time ChatGPT or LLAMA but there are many other models

Example of different LLM models

Usually those models are used by sending a well-formed messaged usually called “prompt”, and they use the data on which they were trained on to produce an informative answer.

Asking ChatGPT on food recipe

Those models are trained on public data, that’s not specific to a certain knowledge field like physics, physiology or contain info about some corporation internal culture. So answering question from specific fields like “Law” or “Chemistry” will produce an insufficient and inaccurate answer. So what is the solution for such problem? Adjusting the architecture (Train a new model on data related to this specific field or even fine Tune an existing model to be more biased to the information given on this specific field) this may give a well trained model even better than GPT4o but in this specific field.

RAG

A light-weight version of “adjusting the architecure”. Instead of training a new model with significant computational resources required for such an endeavor what if we can provide the context on which we need the model to take information from.

RAG stands for retrival Augmented Generation. Maybe Generation is the most intuitive word, LLMs generate some content. But think about what if I’m speialized in some specific field, so what if we let the LLM to have more specific info about the content to be generated, strenthen its knowledge about something, augmenting its knowledge base with some info retrieved from other knowledge source.

knowledge source: A batabase contains some documents represents our knowledge. The database can be a search engine like google from which we get info and feeding out LLM with those info.
retrieved: If we asked the database/search engine a question, we get the documents related to that question.
augmenting: Giving those docs to the LLM, and asking the same question we asked before to knowledge source and get answers tailored to those docs.

Notice this simple RAG system, asking the question online(Database), retrieve info, feeding ChatGPT with those retrieved info and then answer our question based on those info.

What if the question was more specific than that?

The answer is general simply says: which course?, we need to augment the answer with info retrieved from some DB related to the specific course we asked about

Now we should build a system tailored to our purpose which is “answering questions of students on specific course”.

System Shape

Building a system that can answer students’ questions on a specific course.

The Goal to build a system does the following steps:

  1. Ask question to DB and get relative docs.
  2. Build query using those docs and send this query to a LLM.
  3. Get the answer from the LLM.
def qa_rag_system(query):
search_results = elastic_search(query)
prompt = build_prompt(query, search_results)
answer = llm(prompt)
return answer

Environment preparation

  1. installing python
  2. installing pipenv
# In project folder
pip install pipenv
  1. Choosing your DB that contains knowledge
  2. Choosing LLM provider.

In my case, I used github codespaces which provides you with a VM containing all tools you will need. Developing RAG applications locally is costy and requires lots of hardware resources.

Python and pipenv installtions

If using github codespaces you will find python pre-installed but not pipenv. Pipenv is a Python virtualenv management tool used to separate your project dependencies from the rest of the system. In github codespaces you will really found separate environnments really useful as you are already working on a whole separate system from your computer.

Choosing DB

What we need is a Data Base that stores our documents, and when asking it some question it returns the most precise documents it already stores (It's not the phase where we generating answers, it's about returning the exact docs to give to the LLM and generate answer).

The best choise for such case is to use elastic search, which is a nosql database used to make advanced search on the docs it contains. It's working like a search engine.

Or if you are can even build your own simple elastic search. To do so follow this lesson using TF-IDF and cosine similarity to retrieve relevant info.

Choosing LLM provider

There’s lots of LLMs. You may use OpenAI chatgpt platform. all you need is to make API key and use it to communicate with your prefered openAI model. I used a free open-source LLM called LLaMA3 8b developed by meta provided by a plcatform called groq. You also need to make an API key and then start communicating with the preferred model.

Many other free LLMs can be found on this document.

Establishing QA RAG System

Insalling packages we need

pip install requests groq tqdm elasticsearch titoken notebook==7.1.2

# groq is the package used to communicate with (LLaMA3 8b) model.
# tqdm is a progress bar package, used to visualize looping progress.
# elasticsearch is the package used to communicate with elsaticsearch DB.
# tiktoken is an open source package used by openAI models to calculate the number of tokens in the input and the generated output.

Configuring environment variables needed for GROQ credentials

Adding GROQ_API_KEY which is used by groq package to authenticate your connection to groq_api.

export GROQ_API_KEY='{API_KEY}' # It's really secret and you need to hide it.

preparing Docs

Now you should prepare the Documents you want to store on elasticsearch and use it later as you knowledge base. So after storing those docs and then you ask elasticsearch a question, you got the most relatice documents to your ansewe.

The docs used during this module is the FAQ of Zoomcamp courses (DE Zoomcamp, ML Zoomcamp, MLOps Zoomcamp).

To download those docs and converting them to JSON format, use the following python notebook.

Start elasticsearch server

using Docker which is pre-installed on github code spaces, you can start an isolated elasticserch server that contains your Docs.

docker run -it \
--rm \
--name elasticsearch \
-p 9200:9200 \
-p 9300:9300 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
docker.elastic.co/elasticsearch/elasticsearch:8.4.3
# Testing connectivity
curl http://localhost:9200

You should get something like this

{
"name" : "741ed89c1f9b",
"cluster_name" : "docker-cluster",
"cluster_uuid" : "7YWbsBlmSP6psPlfr_s-lw",
"version" : {
"number" : "8.4.3",
"build_flavor" : "default",
"build_type" : "docker",
"build_hash" : "42f05b9372a9a4a470db3b52817899b99a76ee73",
"build_date" : "2022-10-04T07:17:24.662462378Z",
"build_snapshot" : false,
"lucene_version" : "9.3.0",
"minimum_wire_compatibility_version" : "7.17.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "You Know, for Search"
}

Loading DOCs to elasticsearch

  1. start elasticsearch connection
es_client = Elasticsearch('http://localhost:9200')

2. build an index which is relation in DBMS

# think about it as a relation with 4 columns and the primary key called "course" by which you group the docs.
# when searching, elastic search searches the oother 2 columns for similarity and return the most similar ones.
index_settings = {
"settings": {
"number_of_shards": 1,
"number_of_replicas": 0
},
"mappings": {
"properties": {
"text": {"type": "text"},
"section": {"type": "text"},
"question": {"type": "text"},
"course": {"type": "keyword"} # primary key in RDBMS taste
}
}
}

index_name = "course-questions"

es_client.indices.create(index=index_name, body=index_settings)v

3. load docs to course-questions index

for doc in tqdm(documents):
es_client.index(index=index_name, document=doc)

Now all setup is done, we have the knowledge base (DB) so start RAG.

Building R-A-G

R — Retrieving the most related documents to certain query form elasticsearch

def filter_search(query, size=5, search_words=[]):
def filter_builder():
return list(map(lambda word: {"term": {**word}}, search_words))

search_query = {
"size": size, # number of docs to be returned
"query": {
"bool": {
"must": {
"multi_match": {
"query": query,
# Giving higher score for the document if the question of the document is much similar
# to the query than the text of the document.
"fields": ["question^4", "text"],
"type": "best_fields"
}
},
"filter": filter_builder() # filter by the keyword (primary key)
}
}
}

return es_client.search(index=index_name, body=search_query)

Preparing the prompt

context_template = """
Q: {question}
A: {text}
""".strip()

context = ""
for doc in returned_docs:
context += context_template.format(question=doc["_source"]["question"], text=doc["_source"]["text"]) + "\n\n"
print(context)
prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.
QUESTION: {question}
CONTEXT:
{context}
""".strip()
prompt = prompt_template.format(question=query, context=context).strip()

G — send the query after to the LLM and the answer will be A — Augmented by the docs content

def ask_groq(prompt, model="mixtral-8x7b-32768"):
client = Groq(
api_key=os.environ.get("GROQ_API_KEY"),
)

chat_completion = client.chat.completions.create(
messages=[
{
"role": "user",
"content": prompt,
}
],
model=model,
)

return chat_completion.choices[0].message.content
answer = ask_groq(prompt)

Please take a look on the this module materials on Github to learn more about this concept and how to even build your own search engine instead of using elastic search.

Finally

Thanks for reading, I hope this article to be beneficial in your learning journey ❤❤

--

--

Mahmoud Kamal

Supposed to be a computer engineer 🤦🏻‍♂️😂