Tech in Flux: The Evolution from Software Engineer to Machine Learning Engineer in a Volatile Market, Exploring Retrieval Augmented Generation (RAG): A Fusion of Retrieval and Generation in Natural Language Processing (NLP), and Building RAG for a Question Answering System with Custom Retrieval and Generation Steps!!

Published in

𝐀𝐈 𝐦𝐨𝐧𝐤𝐬.𝐢𝐨

4 min readMay 6, 2024

Graphics Credits: Notes from a curious mind

I took a pause from writing for a few days. Phew. But it didn’t help. The market is super volatile right now. All companies are pivoting. #Layoffs happened and continue happening. The tech world is in virtual fire. All software engineering roles got rebranded as machine learning engineer roles.

Tech stacks remain the same as we’ve been working on for the past decade; it’s just the applications that have changed.

Now we build APIs for LLM Models & not for website REST APIs.
Now we create amazing Python programs for ML models & not for some app framework microservices or automation.
Now we do research for ML and not for UX.
All these AI tech: GPTs, BERT, LLAMA, BING, etc., forced the world to switch gears overnight.
They want the same people but just a rebranding of ML on the resume. Now say all the buzzwords of ML and get a job.
Compete with top talent laid off from big tech companies. Salary resets.
Hire Top Talent in Cheap Prices!
Thanks to the f**ked-up economy.

I have been seeing way too many posts, and all the 1000 new AI startups want to work on LLM, RAG, GPTs, build their own ML Models, chatbots, etc.

So today, let’s give them that!

Let’s learn all about RAG Systems.

Introduction:

Retrieval Augmented Generation (RAG) stands at the frontier of Natural Language Processing (NLP), seamlessly blending the power of retrieval-based methods with the creativity of generative models.

This innovative approach has garnered attention for its ability to generate more relevant and coherent responses in various NLP tasks.

Let’s delve deeper into what RAG entails, its applications, architecture, and even a coding example to understand its practical implementation.

Definition and Use:

RAG can be defined as a technique that integrates a retrieval mechanism with a generative model to produce responses in natural language tasks. Unlike traditional generative models like GPT, which generate responses solely based on learned patterns, RAG combines this with retrieving relevant information from a large corpus of text.

This fusion enhances the quality and relevance of the generated responses, making them more contextually accurate and coherent.
RAG finds applications in conversational agents, question answering systems, and content generation tasks where context plays a crucial role in generating appropriate responses.

Architecture:

The architecture of RAG typically consists of two main components: the retrieval module and the generation module.

The retrieval module retrieves relevant passages or documents from a large text corpus based on the input query or context.
These retrieved passages serve as the input to the generation module, which then generates the final response incorporating both the retrieved information and learned patterns from the generative model.

This hybrid architecture enables RAG to leverage the strengths of both retrieval-based and generative models, resulting in more accurate and contextually relevant responses.

Real-World Example:

One real-world example of RAG in action is its application in conversational agents or chatbots.

Consider a scenario where a user asks a question to a chatbot about a specific topic, such as “What are the symptoms of COVID-19?” Instead of solely relying on the generative model to formulate a response, the chatbot can use RAG to retrieve relevant information from authoritative sources like medical websites or research papers.

By integrating this retrieved information with the generative model, the chatbot can provide more accurate and trustworthy responses tailored to the user’s query, enhancing the overall user experience.

Code:

Detailed example demonstrating the use of RAG in a question answering system with custom retrieval and generation steps:

from transformers import RagTokenizer, RagRetriever, RagTokenForGeneration
import torch

# Initialize RAG tokenizer, retriever, and token generator
tokenizer = RagTokenizer.from_pretrained("facebook/rag-token-base")
retriever = RagRetriever.from_pretrained("facebook/rag-token-base", index_name="exact", use_dummy_dataset=True)
model = RagTokenForGeneration.from_pretrained("facebook/rag-token-base", retriever=retriever)

# Define a custom retrieval function to retrieve relevant passages
def custom_retrieval(query):
    # Your custom retrieval logic here
    relevant_passages = retrieve_passages_from_database(query)
    return relevant_passages

# Define a custom generation function to generate response
def custom_generation(query, retrieved_passages):
    # Encode retrieved passages
    passages_input_dict = tokenizer(retrieved_passages, return_tensors="pt", padding=True, truncation=True)

    # Encode query
    query_input_dict = tokenizer(query, return_tensors="pt", padding=True, truncation=True)

    # Concatenate retrieved passages and query input IDs
    input_ids = torch.cat((passages_input_dict["input_ids"], query_input_dict["input_ids"]), dim=1)

    # Generate response using RAG model
    generated_output = model.generate(input_ids=input_ids, num_return_sequences=1, max_length=50, early_stopping=True)

    # Decode generated response
    response = tokenizer.decode(generated_output[0], skip_special_tokens=True)
    return response

# Input query
query = "What are the symptoms of COVID-19?"

# Perform custom retrieval to retrieve relevant passages
retrieved_passages = custom_retrieval(query)

# Perform custom generation using retrieved passages
response = custom_generation(query, retrieved_passages)

print("Generated Response:", response)

In this example:

We import the necessary components from the transformers library.
We initialize the RAG tokenizer, retriever, and token generator from the Facebook RAG model.
We define a custom retrieval function (custom_retrieval) to retrieve relevant passages based on the input query. This function may involve querying a database, searching the web, or any other custom logic.
We define a custom generation function (custom_generation) to generate the response using the retrieved passages and the input query. This function encodes the retrieved passages and the query, concatenates them, generates the response using the RAG model, and decodes the generated output.
Finally, we input a query, perform custom retrieval to retrieve relevant passages, perform custom generation using the retrieved passages, and print the generated response.

Follow for more things on AI! The Journey — AI By Jasmin Bharadiya

The Journey is here! Your go-to AI Digest by Jasmin Bharadiya! Join me on The Journey. Let's learn…

The Journey is here! Your go-to AI Digest by Jasmin Bharadiya! Join me on The Journey. Let's learn all things AI all…

jasminbharadiya.medium.com