Introduction to LLMs and RAG (Module — 1)

4 min readJun 14, 2024

These blogs are based on lessons learnt at LLM Zoomcamp, a free online course about real-life applications of LLMs. In 10 weeks you will learn how to build an AI bot that can answer questions about your knowledge base.
Gratitude to Alexey Grigorev for doing this course.

Key concepts discussed

What is an LLM?
What is a RAG ?
Configuring your environment
Retrieval & Search
Using Elasticsearch

LLMs

Large language models (LLM) are very large deep learning models that are pre-trained on vast amounts of data. The underlying transformer is a set of neural networks that consist of an encoder and a decoder with self-attention capabilities. The encoder and decoder extract meanings from a sequence of text and understand the relationships between words and phrases in it.

They are capable of unsupervised training, although a more precise explanation is that transformers perform self-learning. It is through this process that transformers learn to understand basic grammar, languages, and knowledge.

Unlike earlier recurrent neural networks (RNN) that sequentially process inputs, transformers process entire sequences in parallel.

RAG

Retrieval-augmented generation (RAG) is a technique for enhancing the accuracy and reliability of generative AI models with facts fetched from external sources.

Configuring your environment

Ensure that you have python installed in your system. I am using mac laptop & vs code editor.

Create a project folder, virtual environment, activate it and install necessary libraries.

python3 -m venv llm 
source llm/bin/activate 
pip install tqdm notebook==7.1.2 openai elasticsearch scikit-learn pandas python-dotenv wget

Creating secret key for openai api access —

Open https://platform.openai.com/docs/overview and sign in.

Steps to generate secret key : Dashboard → API Keys → Create new secret key

Copy the secret key somewhere, it won’t be accessible later. Save them in a new file named .env

ipython kernel install — user — name=llm
jupyter notebook

Create a new notebook , Select kernel as llm in the drop down.

import os
from dotenv import load_dotenv
import openai
from openai import OpenAI
load_dotenv() 
client = OpenAI(api_key=os.getenv('OPEN_API_KEY'))
response = client.chat.completions.create(
 model = 'gpt-3.5-turbo',
 messages = [{"role":"user","content":"is it too late to do an mba?"}]
)
response.choices[0].message.content

Yayy! You are done with the setup. You will need an chatgpt subscription to get the output from API.

Retrieval & Search

In this section we use a simple inbuilt search engine — link

# Download the necessary files using wget
import wget

url = 'https://raw.githubusercontent.com/DataTalksClub/llm-zoomcamp/main/01-intro/minsearch.py'
wget.download(url)
url = 'https://raw.githubusercontent.com/DataTalksClub/llm-zoomcamp/main/01-intro/documents.json'
wget.download(url)

import os
import json
from dotenv import load_dotenv
import minsearch
from openai import OpenAI
load_dotenv()
with open("documents.json",'rt') as f:
  docs_raw = json.load(f)
documents = []
for course_dict in docs_raw:
  for doc in course_dict['documents']:
    doc['course'] = course_dict['course']
    documents.append(doc)

index = minsearch.Index(
    text_fields=["question", "text", "section"],
    keyword_fields=["course"]
)

query = 'the course has already started, can I still enroll?'

index.fit(documents)

boost = {'question': 3.0, 'section': 0.5}

results = index.search(
        query=query,
        filter_dict={'course': 'data-engineering-zoomcamp'},
        boost_dict=boost,
        num_results=5
    )

def build_prompt(query, search_results):
    prompt_template = """
You're a course teaching assistant. Answer the QUESTION based on the CONTEXT from the FAQ database.
Use only the facts from the CONTEXT when answering the QUESTION.

QUESTION: {question}

CONTEXT: 
{context}
""".strip()

    context = ""
    
   for doc in search_results:
    context = context + f"section: {doc['section']}\nquestion: {doc['question']}\nanswer: {doc['text']}\n\n"
    prompt = prompt_template.format(question=query, context=context).strip()
   return prompt

prompt = build_prompt(query,results)

client = OpenAI(api_key=os.getenv('OPEN_API_KEY'))

response = client.chat.completions.create(
    model = 'gpt-3.5-turbo',
    messages = [{"role":"user","content":prompt}])
print(response.choices[0].message.content)

Clean & Modular code can be found here — link

Using Elasticsearch

Elasticsearch is a distributed search and analytics engine optimized for speed and relevance on production-scale workloads. Search in near real-time over massive datasets, perform vector searches, integrate with generative AI applications, and much more.

Download docker in your system and open it. Then run the following command in vscode or terminal to setup elastic search locally

docker run -it \
--name elasticsearch \
-p 9200:9200 \
-p 9300:9300 \
-e "discovery.type=single-node" \
-e "xpack.security.enabled=false" \
docker.elastic.co/elasticsearch/elasticsearch:8.4.3

Ensure that http://localhost:9200/ is up.

from tqdm.auto import tqdm
from elasticsearch import Elasticsearch

es_client = Elasticsearch("http://localhost:9200/")

es_client.info()

index_settings = {
    "settings": {
        "number_of_shards": 1,
        "number_of_replicas": 0
    },
    "mappings": {
        "properties": {
            "text": {"type": "text"},
            "section": {"type": "text"},
            "question": {"type": "text"},
            "course": {"type": "keyword"} 
        }
    }
}

index_name = "course-questions"

es_client.indices.create(index=index_name, body=index_settings)
for doc in tqdm(documents):
    es_client.index(index=index_name, document=doc)

response = es_client.search(index=index_name,q=query)

result_docs = []
for hit in response['hits']['hits']:
    result_docs.append(hit['_source'])

prompt = build_prompt(query,result_docs)

client = OpenAI(api_key=os.getenv('OPEN_API_KEY'))

response = client.chat.completions.create(
    model = 'gpt-3.5-turbo',
    messages = [{"role":"user","content":prompt}]
)

print(response.choices[0].message.content)