RAG Explained: Showcasing Azure’s No-Code-Low-Code LLMOps alongside LangChain Expertise

Kamaljeet Kharbanda
10 min readApr 23, 2024

--

Retrieval-Augmented Generation (RAG) is transforming how enterprise data is processed and utilized, combining deep knowledge bases with the powerful analytical capabilities of Large Language Models (LLMs). This synergy allows for advanced interpretations of complex datasets, which is critical in today’s fast-paced digital ecosystem where efficient data utilization can drive significant competitive advantage.

Low-code/no-code platforms such as Azure AI Studio on Microsoft Azure, Amazon Bedrock on AWS, and Vertex AI on Google Cloud are at the forefront of this revolution, making advanced AI functionalities accessible to a broader audience. These platforms simplify Language Model Operations (LLMOps) with user-friendly interfaces that maintain rigorous security and compliance standards. Additionally, each platform offers comprehensive SDKs that support traditional programming approaches, providing flexibility for developers to leverage RAG capabilities through custom code.

Alongside these cloud solutions, tools like LangChain and Llama Index offer robust environments for implementing RAG through code-intensive methods. These tools are essential for scenarios where customized or granely controlled AI functionality is needed, complementing the streamlined capabilities of cloud platforms.

In this article, I will demonstrate RAG’s practical application through a dual demonstration using Amazon’s Q4 financial report: one showcasing Azure AI Services’ low-code/no-code approach and another using LangChain to illustrate a more code-centric implementation. This comparison will highlight how different technologies and platforms can be utilized to harness the power of LLMs effectively within an enterprise setting, providing insights into the diverse methodologies available for integrating advanced AI into business processes

Understanding Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG) is an innovative artificial intelligence approach designed to significantly enhance the functionality of large language models (LLMs). It achieves this by dynamically incorporating external data sources, thereby enriching the model’s output with timely and context-specific knowledge. The RAG process can be broken down into three fundamental phases:

  1. Data Preparation: This phase involves the meticulous organization and preparation of data. The aim is to make the data optimally accessible for subsequent retrieval operations.
  2. Retrieval: During this phase, the system retrieves information pertinent to the user’s inquiry. This step is critical as it determines the relevance of the information that will later be used in the response generation.
  3. Generation: In the final phase, the model employs the retrieved data to synthesize a detailed and informed response. This process is key to ensuring the output transcends beyond the confines of the model’s inherent knowledge, incorporating specific insights drawn from the selected external documents.

Data Preparation

  • Segmentation/Chunking: This process involves dividing extensive texts into manageable segments, facilitating more efficient data processing.
  • Embedding Generation: Here, the segments are converted into vector embeddings. These embeddings are crucial as they enable the system to perform similarity comparisons during retrieval.
  • Index Building: The embeddings are organized systematically, allowing for swift retrieval when needed.

Retrieval

  • Query Processing: User queries are transformed into embeddings, aligning them for comparison against the indexed documents.
  • Similarity Search and Ranking: The system identifies and ranks segments by the closeness of their embeddings to the query, selecting the most relevant for response generation.

Generation

  • Context Aggregation & Response Synthesis: A response is crafted by the system that integrates the retrieved information, thus ensuring relevance and precision.

Comparative Analysis: LangChain and Azure AI Services

To present a comparative analysis of LangChain and Azure AI Services in the implementation of RAG, it’s essential to explore each system’s approach to the three core stages: Data Preparation, Retrieval, and Generation. While LangChain offers a developer-centric environment demanding coding expertise, Azure AI Services propels us into the future with its low-code-no-code paradigm, making advanced LLMOps accessible to a broader audience. The following table encapsulates the methodologies and components each platform utilizes, providing a clear distinction between the developer-driven and user-friendly approaches in handling enterprise data through RAG.

LLMOps Components- LangChain vs Azure AI Services

Implementation for RAG Processing- Using LangChain and Azure AI

In this detailed section, we examine how both LangChain and Azure AI can be utilized to process and analyze unstructured data through the Retrieval-Augmented Generation (RAG) framework. We present a side-by-side comparison, demonstrating the application of LangChain’s Python-based tools alongside Azure AI’s no-code/low-code capabilities in handling Amazon’s Q4 financial report. For each step in the RAG process — data preparation, information retrieval, and response generation — code snippets from LangChain are complemented by descriptions of Azure AI’s relevant components.

Data Preparation

LangChain Implementation for Data Preparation:

LangChain’s data preparation process involves two primary components: the Document Loader and the Text Splitter. These tools are crucial for managing and pre-processing the data efficiently, making it suitable for further analysis and retrieval operations.

  • Document Loader: LangChain’s Document Loader handles the ingestion of raw data files, such as PDFs, converting them into a structured format that can be easily manipulated and analyzed. For a deeper understanding of Document Loaders, visit LangChain’s Document Loaders Documentation.
  • Text Splitter: After the documents are loaded, the Text Splitter component divides the text into smaller, manageable chunks. This segmentation helps in simplifying the data and makes the retrieval process more efficient. Learn more about Text Splitters on LangChain’s Document Transformers Documentation.

Code Implementation:

# Import LangChain's document processing modules
from langchain_community.document_loaders import PyPDFLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter

# Path to the PDF file
pdf_file_path = 'https://s2.q4cdn.com/299287126/files/doc_financials/2023/q4/AMZN-Q4-2023-Earnings-Release.pdf'

# Initialize the PDF Loader to read and load the document
loader = PyPDFLoader(pdf_file_path)
documents = loader.load_and_split()

# The loader processes the Amazon Q4 financial report and splits it into manageable documents
# Each document is then prepared for further text splitting

# Initialize the Text Splitter to divide the document into smaller segments
text_splitter = RecursiveCharacterTextSplitter(chunk_size=5000, chunk_overlap=50, length_function=len, is_separator_regex=False)
splitted_data = text_splitter.split_documents(documents)

# These segments are essential for efficient handling during the retrieval and response generation stages

Azure AI Service for Data Preparation: Azure Document Intelligence (Formerly known as Azure Form Recognizer)

  • Service Description: Azure’s Document Intelligence service automates the extraction and conversion of data from documents like PDFs into a structured JSON format that is easier to handle and analyze programmatically.
Azure Document Intelligence Service
  • Utility (Document Intelligence Studio): This service simplifies the initial stages of data processing by providing a clean, structured output from unstructured sources, which is crucial for efficient data handling in subsequent stages of the RAG process.
Azure Document Intelligence Studio with Various options to analysis any document
Amazon Q4 earning reports for analysis using Azure Document Intelligence Studio
Content of PDF after Analysis
Json outout of Analysis

Retrieval

LangChain Implementation for Retrieval:

The retrieval stage in the RAG framework with LangChain is focused on utilizing embedded data for quick information fetching. This process is crucial for ensuring that the generation phase has access to the most relevant information.

  • Vector Storage (Chroma): After segmenting the text data, LangChain uses vector embeddings to represent the semantic content of text chunks. These embeddings are stored in a vector database system called Chroma, which supports efficient similarity searches essential for rapid data retrieval. Learn more about vector storage in LangChain here.
  • Query Processing: Converts user queries into embeddings using a process that matches the method used for document chunks to ensure compatibility in vector format. This is vital for identifying the most relevant document segments during retrieval. Detailed information about embeddings and query processing can be found here.

Code Implementation:

# Import vector storage and embedding modules
from langchain_community.vectorstores import Chroma
from langchain_openai import OpenAIEmbeddings

# Generate and store embeddings for the split text data
embeddings = OpenAIEmbeddings()
store = Chroma.from_documents(
splitted_data,
embeddings,
ids=[f"doc-{index}" for index, _ in enumerate(splitted_data)],
collection_name='processed_pdf_documents',
persist_directory='db',
)
store.persist() # Persist data to enable efficient retrieval

Azure AI Service for Retrieval: Azure AI Search (Formerly known as Cognitive Search), Azure Open AI and Azure AI Studio

  • Service Description: Azure utilizes Azure AI Search for indexing and OpenAI for embedding generation to handle large-scale data inquiries efficiently. This integration is part of Azure AI Studio, which facilitates complex LLMOps pipelines including prompt flows that streamline the execution of language model operations. For more details on Azure AI Search, visit this link, For more information on OpenAI services used in embedding generation, visit this link.
  • Utility: Azure AI Studio integrates these services to provide a seamless workflow for data indexing and retrieval using Azure Prompt Flow. This setup helps automate and optimize the retrieval of information based on query relevancy, significantly simplifying the LLMOps process. Explore how Azure AI Studio supports LLMOps here.
Azure Open AI Service- Deployment of GPT4 Model
Azure Open AI Service- final list of deployed Models
Azure AI Studio- Integeration with Various Azure AI Services
Azure AI Studio- Creating Index based previous extract Json from Azure Document Intelligence
Azure AI Studio- Index Creation Pipeline Status

Generation:

LangChain Implementation for Generation:

In the generation phase of RAG using LangChain, the goal is to synthesize a coherent and contextually relevant response using the context aggregated from retrieved data segments.

  • Context Aggregation: This involves consolidating all the relevant information retrieved in the previous stage into a comprehensive context that can be utilized for generating responses.
  • Response Synthesis: Utilizes the aggregated context to generate an informed response that addresses the user’s query comprehensively. This process is facilitated by sophisticated language models like GPT (OpenAI GPT), capable of understanding complex contexts and generating appropriate responses.

Code Implementation:

# Setup the retrieval chain with a contextual template for response generation
from langchain_openai import ChatOpenAI
from langchain.prompts import PromptTemplate

template = "Using the financial data provided from the document, summarize the key financial outcomes, trends observed, and any significant changes in financial position detailed in the report. Include relevant figures and statistics where applicable.\n\nContext: {context}\n\nQuestion: {question}"
prompt = PromptTemplate(template=template, input_variables=['context', 'question'])
llm = ChatOpenAI(temperature=0.7, model='gpt-4')

# Generate the response using the LangChain retrieval and synthesis setup
qa = RetrievalQA.from_chain_type(llm=llm, chain_type='stuff', retriever=store.as_retriever(), chain_type_kwargs={"prompt": prompt}, return_source_documents=True)
result = qa.invoke(question)
print("Results:")
print(result['result']) # Output the generated response
LangChain- Results of OpenAI GPT Model

Azure AI Service for Generation: Azure Open AI, Azure AI Studio

  • Service Description: In Azure, the generation phase is powered by Azure OpenAI Service, which includes capabilities to utilize pre-trained models like GPT for generating responses. The integration of these services within Azure AI Studio ensures that the generation of responses is both efficient and contextually enriched.
  • Utility: The use of Azure AI Service allows for the implementation of complex response synthesis scenarios, where the input context and queries are processed to produce outputs that are tailored to the user’s informational needs. For more details on the Azure OpenAI Service, visit this link.
Azure AI Studio- Playground to Analyze Output based on created Index
Azure Prompt Flow- Using Existing RAG Template for Q&A on Your Data
Azure Prompt Flow- Dataflow Graph on Right along with parameters on right
Azure Prompt Flow- Selection of Previously Created Index
Azure Prompt Flow- Index Parametes
Azure Prompt Flow- Open AI Parameters
Azure Prompt Flow- Running with various prompt variants
Azure Prompt Flow- Variants Output
Azure Prompt Flow- Trace for Tokens Used in Analysis

Conclusion

RAG technology, particularly when integrated with low-code/no-code solutions provided by cloud platforms like Azure AI, represents a significant advancement in how businesses can leverage their data. By automating the extraction and processing of information, RAG allows companies to quickly access precise insights from vast amounts of data. LangChain provides a robust framework for those who prefer a more hands-on, code-driven approach, while Azure AI Services shines in its ability to simplify complex processes with its intuitive, user-friendly interface.

For developers and data scientists interested in implementing RAG in their projects, both LangChain and Azure offer comprehensive solutions that cater to a wide range of technical skills — from seasoned coders to those just starting out with AI and machine learning.

  • LangChain excels with its detailed, customizable modules that require a deep understanding of Python and data science principles.
  • Azure AI Services facilitates a smoother transition into AI for non-specialists through its streamlined, graphical interfaces and extensive documentation and support.
  • Azure SDK for Python offers similar capabilities as LangChain, providing another powerful tool for those familiar with Python (Azure Python SDK).

This story is published under Generative AI Publication.

Connect with us on Substack, LinkedIn, and Zeniteq to stay in the loop with the latest AI stories. Let’s shape the future of AI together!

--

--

Kamaljeet Kharbanda

Master's Student of Data Science | Works on Data Platforms | Passionate about AI/ML Technologies