Fixing RAG with Reasoning Augmented Generation
🛠️ Why RAG is Broken — And How ReAG Fixes It
Retrieval-Augmented Generation (RAG) promised smarter AI, but its flaws are holding us back. Here’s why Reasoning-Augmented Generation (ReAG) is the upgrade we need.
❌ The Problem with Traditional RAG
Traditional RAG systems work like librarians with bad memories:
- Semantic Search Isn’t Smart 🤖: They retrieve documents based on surface-level similarity (e.g., matching “air pollution” to “car emissions”) but miss contextually relevant content (e.g., a study titled “Urban Lung Disease Trends”) 23.
- Infrastructure Nightmares 🏗️: Chunking, embedding, and vector databases add layers of complexity. Each step risks errors like stale indexes or mismatched splits 2.
- Static Knowledge ⏳: Updating indexed documents is slow — a death sentence for fields like medicine or finance where data changes daily 2.
Imagine asking, “Why are polar bears declining?” and getting generic answers about “Arctic ice melt” while missing a key study on ice-free feeding disruptions. That’s RAG’s flaw.
🚀 Enter ReAG: Let the Model Reason, Not Just Retrieve
ReAG skips the RAG pipeline entirely. Instead of preprocessing documents into searchable snippets, it feeds raw materials (text files, spreadsheets, URLs) directly to the language model. The LLM then:
- Reads Entire Documents 📖: No chunking or embeddings — full context preserved.
- Asks Two Questions ❓:
- “Is this document useful?” (Relevance check)
- “What specific parts matter?” (Content extraction)
- Synthesizes Answers 🧩: Combines insights like a human researcher, connecting dots even if keywords don’t match 25.
Example: For the polar bear question, ReAG might parse a climate report titled “Thermal Dynamics of Sea Ice” and extract a section linking ice loss to disrupted feeding patterns — even if “polar bears” never appears .
⚙️ How ReAG Works: A Technical Breakdown
📂Raw Document Ingestion:
- No preprocessing — documents are ingested as-is (markdown, PDFs, URLs)
⚡Parallel LLM Analysis :
- Each document undergoes simultaneous relevance checks and content extraction.
🌐Dynamic Synthesis :
- Irrelevant documents are filtered; validated content fuels answer generation.
💡 Why ReAG Wins: Strengths & Trade-offs
✅ Strengths
- Handles Dynamic Data 📰: Real-time news, live market feeds, or evolving research? ReAG processes updates on the fly — no re-embedding needed.
- Solves Complex Queries 🧠: Questions like “How did post-2008 regulations affect community banks?” require piecing together disparate sources. ReAG infers indirect links better than RAG .
- Multimodal Mastery 📊: Analyzes charts, tables, and text together — no extra preprocessing .
⚠️ Trade-offs
- Higher Costs 💸: Processing 100 documents via ReAG means 100 LLM calls vs. RAG’s cheap vector searches 25.
- Slower at Scale 🐢: For million-document datasets, hybrid approaches (RAG + ReAG) may work better 2.
🛠️ The Tech Stack Powering ReAG
🌐 Component Breakdown
1. GROQ + Llama-3.3–70B-Versatile 🚀
Role: Relevancy Assessment (First-Stage Filtering)
Why It Shines:
- Blazing-fast inference (500+ tokens/sec) via GROQ’s LPU architecture 🔥
- 70B parameters enable nuanced document relevance scoring, even for indirect queries.
- Large Context window 128 K tokens
Example: Flags a climate report titled “Thermal Dynamics of Sea Ice” as relevant to “polar bear decline” despite no keyword overlap.
2. Ollama + DeepSeek-R1:14B 🧠
Role: Response Synthesis (Second-Stage Reasoning)
Why It Shines:
- Lightweight, cost-efficient 14B model fine-tuned for extraction/summarization.
- Runs locally via Ollama, ensuring data privacy and reducing cloud costs.
- Large Context window 128 K tokens
Example: Extracts “ice-free feeding windows reduced by 22% since 2010” from flagged documents.
3. LangChain 🎻
Role: Orchestration & Workflow Automation
Key Features:
- Parallelizes GROQ (relevance) and Ollama (synthesis) tasks.
- Manages document routing, error handling, and output aggregation.
⚡ Why This Stack Works
- Cost Efficiency 💸: Offloads heavy lifting to GROQ’s hardware-optimized API, while Ollama handles lightweight tasks locally.
- Scalability 📈: GROQ’s LPUs handle 1000s of concurrent document evaluations.
- Flexibility 🧩: Swap models (e.g., Mistral for Ollama) without rewriting pipelines.
Note : Personal observation : In order to leverage ReAG better to use large context window based LLMs in case we have > 50 pages in a document
🪀Code Implementation of ReAG
Install required dependencies
!pip install langchain langchain_groq langchain_ollama langchain_community pymupdf pypdf
Download Data
!mkdir ./data
!mkdir ./chunk_caches
!wget "https://www.binasss.sa.cr/int23/8.pdf" -O "./data/fibromyalgia.pdf"
Setup LLM
from langchain_groq import ChatGroq
from langchain_ollama import ChatOllama
import os
os.environ["GROQ_API_KEY"] = "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"
#
llm_relevancy = ChatGroq(
model="llama-3.3-70b-versatile",
temperature=0,)
#
llm = ChatOllama(model="deepseek-r1:14b",
temperature=0.6,
max_tokens=3000,
)
Define System Prompt
REAG_SYSTEM_PROMPT = """
# Role and Objective
You are an intelligent knowledge retrieval assistant. Your task is to analyze provided documents or URLs to extract the most relevant information for user queries.
# Instructions
1. Analyze the user's query carefully to identify key concepts and requirements.
2. Search through the provided sources for relevant information and output the relevant parts in the 'content' field.
3. If you cannot find the necessary information in the documents, return 'isIrrelevant: true', otherwise return 'isIrrelevant: false'.
# Constraints
- Do not make assumptions beyond available data
- Clearly indicate if relevant information is not found
- Maintain objectivity in source selection
"""
Define RAG Prompt
rag_prompt = """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
Question: {question}
Context: {context}
Answer:
"""
Define the Response Schema
from pydantic import BaseModel,Field
from typing import List
from langchain_core.output_parsers import JsonOutputParser
class ResponseSchema(BaseModel):
content: str = Field(...,description="The page content of the document that is relevant or sufficient to answer the question asked")
reasoning: str = Field(...,description="The reasoning for selecting The page content with respect to the question asked")
is_irrelevant: bool = Field(...,description="Specify 'True' if the content in the document is not sufficient or relevant to answer the question asked otherwise specify 'False' if the context or page content is relevant to answer the question asked")
class RelevancySchemaMessage(BaseModel):
source: ResponseSchema
relevancy_parser = JsonOutputParser(pydantic_object=RelevancySchemaMessage)
Load and process the input documents
from langchain_community.document_loaders import PyMuPDFLoader
file_path = "./data/fibromyalgia.pdf"
loader = PyMuPDFLoader(file_path)
#
docs = loader.load()
print(len(docs))
print(docs[0].metadata)
Response
8
{'producer': 'Acrobat Distiller 6.0 for Windows',
'creator': 'Elsevier',
'creationdate': '2023-01-20T09:25:19-06:00',
'source': './data/fibromyalgia.pdf',
'file_path': './data/fibromyalgia.pdf',
'total_pages': 8,
'format': 'PDF 1.7',
'title': 'Fibromyalgia: Diagnosis and Management',
'author': 'Bradford T. Winslow MD',
'subject': 'American Family Physician, 107 (2023) 137-144',
'keywords': '',
'moddate': '2023-02-27T15:02:12+05:30',
'trapped': '',
'modDate': "D:20230227150212+05'30'",
'creationDate': "D:20230120092519-06'00'",
'page': 0}
Helper function to format documents
from langchain.schema import Document
def format_doc(doc: Document) -> str:
return f"Document_Title: {doc.metadata['title']}\nPage: {doc.metadata['page']}\nContent: {doc.page_content}"
Helper Function to extract relevant context
### Helper function to extract relevant context
from langchain_core.prompts import PromptTemplate
def extract_relevant_context(question,documents):
result = []
for doc in documents:
formatted_documents = format_doc(doc)
system = f"{REAG_SYSTEM_PROMPT}\n\n# Available source\n\n{formatted_documents}"
prompt = f"""Determine if the 'Avaiable source' content supplied is sufficient and relevant to ANSWER the QUESTION asked.
QUESTION: {question}
#INSTRUCTIONS TO FOLLOW
1. Analyze the context provided thoroughly to check its relevancy to help formulizing a response for the QUESTION asked.
2, STRICTLY PROVIDE THE RESPONSE IN A JSON STRUCTURE AS DESCRIBED BELOW:
```json
{{"content":<<The page content of the document that is relevant or sufficient to answer the question asked>>,
"reasoning":<<The reasoning for selecting The page content with respect to the question asked>>,
"is_irrelevant":<<Specify 'True' if the content in the document is not sufficient or relevant.Specify 'False' if the page content is sufficient to answer the QUESTION>>
}}
```
"""
messages =[ {"role": "system", "content": system},
{"role": "user", "content": prompt},
]
response = llm_relevancy.invoke(messages)
print(response.content)
formatted_response = relevancy_parser.parse(response.content)
result.append(formatted_response)
final_context = []
for items in result:
if (items['is_irrelevant'] == False) or ( items['is_irrelevant'] == 'false') or (items['is_irrelevant'] == 'False'):
final_context.append(items['content'])
return final_context
Invoke the function to retrieve relevant context
question = "What is Fibromyalgia?"
final_context = extract_relevant_context(question,docs)
print(len(final_context))
Helper function to generate response
def generate_response(question,final_context):
prompt = PromptTemplate(template=rag_prompt,
input_variables=["question","context"],)
chain = prompt | llm
response = chain.invoke({"question":question,"context":final_context})
print(response.content.split("\n\n")[-1])
return response.content.split("\n\n")[-1]
Generate Response
final_response = generate_response(question,final_context)
final_response
#################### Response #################################
'Fibromyalgia is a chronic condition characterized by widespread musculoskeletal pain, fatigue, disrupted sleep, and cognitive difficulties like "fibrofog." It is often associated with heightened sensitivity to pain due to altered nervous system processing. Diagnosis considers symptoms such as long-term pain, fatigue, and sleep issues without underlying inflammation or injury.'
Question 2
question = "What are the causes of Fibromyalgia?"
final_context = extract_relevant_context(question,docs)
final_response = generate_response(question,final_context)
##################################Response ############################
Fibromyalgia likely results from disordered central pain processing leading to heightened sensitivity (hyperalgesia and allodynia). Possible causes include dysfunction of the hypothalamic-pituitary-adrenal axis, inflammation, glial activation, small fiber neuropathy, infections like Epstein-Barr virus or Lyme disease, and a genetic component. Other conditions, such as infections or medication side effects, may also contribute to similar symptoms.
Question 3
question = "Do people suffering from rheumatologic conditions may have fibromyalgia?"
final_context = extract_relevant_context(question,docs)
final_response = generate_response(question,final_context)
############################Response################################
Yes, people with rheumatologic conditions, such as rheumatoid arthritis or psoriatic arthritis, may also have fibromyalgia. This is because they share overlapping symptoms, making diagnosis challenging.
Question 4
question = "Mention the nonpharmacologic treatment for fibromyalgia?"
final_context = extract_relevant_context(question,docs)
final_response = generate_response(question,final_context)
############################RESPONSE#########################
Nonpharmacologic treatments for fibromyalgia include patient education, exercise, and cognitive behavior therapy (CBT).
Question 5
question = "According to 2016 American College of Rheumatology Fibromyalgia what is the Diagnostic Criteria for Fibromyalgia?"
final_context = extract_relevant_context(question,docs)
final_response = generate_response(question,final_context)
###############################RESPONSE#############################
The 2016 American College of Rheumatology diagnostic criteria for fibromyalgia require generalized pain in at least four of five body regions for at least three months. Additionally, patients must meet either a Widespread Pain Index (WPI) score of ≥7 with a Symptom Severity Scale (SSS) score of ≥5 or a WPI score of ≥4 with an SSS score of ≥9. Other disorders that could explain the symptoms must be ruled out.
Question 6
question = "What is the starting dosage of Amitriptyline?"
final_context = extract_relevant_context(question,docs)
final_response = generate_response(question,final_context)
#########################RESPONSE###########################
The starting dosage of Amitriptyline for adults is usually between 25 to 50 mg per day, often beginning with a lower dose of 5 to 10 mg at night to minimize side effects before gradually increasing.
Question 7
question = "What has been mentioned about AAPT 2019 Diagnostic Criteria for Fibromyalgia"
final_context = extract_relevant_context(question,docs)
final_response = generate_response(question,final_context)
#########################RESPONSE####################################
The AAPT 2019 criteria for fibromyalgia include multisite pain in at least six of nine specified areas, moderate to severe sleep problems or fatigue, and symptoms lasting three months or more.
Question 8
question = "What are the medications and doses for Fibromyalgia?"
final_context = extract_relevant_context(question,docs)
print(final_context)
final_response = generate_response(question,final_context)
#######################Response##################################
['Duloxetine, milnacipran, pregabalin, and amitriptyline are potentially effective medications for fibromyalgia. Nonsteroidal anti-inflammatory drugs and opioids have not demonstrated benefits for fibromyalgia and have significant limitations.',
'Amitriptyline, cyclobenzaprine, duloxetine (Cymbalta), milnacipran (Savella), and pregabalin (Lyrica) are effective for pain in fibromyalgia.43,46-48,50,52,54',
'Amitriptyline (tricyclic antidepressant) - 5 to 10 mg at night, 20 to 30 mg at night. Cyclobenzaprine (muscle relaxant; tricyclic derivative) - 5 to 10 mg at night, 10 to 40 mg daily in 1 to 3 divided doses. Duloxetine (Cymbalta; serotonin-norepinephrine reuptake inhibitor) - 20 to 30 mg every morning, 60 mg every morning. Milnacipran (Savella; serotonin-norepinephrine reuptake inhibitor) - 12.5 mg every morning, 50 mg twice daily. Pregabalin (Lyrica; gabapentinoid) - 25 to 50 mg at bedtime, 150 to 450 mg at bedtime.',
'Fibromyalgia is often treated with medications such as pregabalin (Lyrica) and duloxetine (Cymbalta). Pregabalin can be started at a dose of 75 mg twice daily, with a maximum dose of 450 mg/day. Duloxetine can be initiated at a dose of 30 mg once daily, with a target dose of 60 mg/day.',
'Fibromyalgia is often treated with medications such as pregabalin (Lyrica) and duloxetine (Cymbalta). Pregabalin can be started at a dose of 75 mg twice daily, with a maximum dose of 450 mg/day. Duloxetine can be initiated at a dose of 30 mg once daily, with a target dose of 60 mg/day.']
Final Response
print(final_response)
#############################Response############################
The medications commonly used to treat fibromyalgia include:
1. **Amitriptyline**: A tricyclic antidepressant typically taken at night in doses ranging from 5 to 30 mg.
2. **Cyclobenzaprine**: A muscle relaxant and tricyclic derivative, usually administered in doses up to 40 mg daily in divided doses.
3. **Duloxetine (Cymbalta)**: A serotonin-norepinephrine reuptake inhibitor taken in the morning, starting at 20-30 mg and increasing to 60 mg if needed.
4. **Milnacipran (Savella)**: Another serotonin-norepinephrine reuptake inhibitor, starting at 12.5 mg in the morning and potentially increased to 50 mg twice daily.
5. **Pregabalin (Lyrica)**: A gabapentinoid taken at bedtime, beginning with 75 mg twice daily and up to a maximum of 450 mg/day.
These medications are effective for managing pain associated with fibromyalgia. It's important to note that dosages should be adjusted under medical supervision, starting low and increasing as necessary. Additionally, NSAIDs and opioids are not recommended for treating fibromyalgia due to limited effectiveness and potential side effects.
🌍Real-World Use Cases
- Medical Research 🩺: Synthesize insights from raw clinical trial data and journals.
- Financial Markets 📉: Analyze live earnings reports and SEC filings for real-time investment strategies.
- Legal Analysis ⚖️: Parse dense case law to identify precedent connections.
🔮 The Future of ReAG
- Hybrid Systems 🤝: Use RAG for initial filtering, then ReAG for deep analysis 3.
- Cheaper Models 📉: Open-source LLMs (e.g., DeepSeek) and quantization will lower costs 2.
- Bigger Context Windows 🪟: Future models will process billion-token documents, making ReAG even more powerful 2.
🎯 Final Takeaway
ReAG isn’t about replacing RAG — it’s about rethinking how AI interacts with knowledge. By treating retrieval as a reasoning task, ReAG mirrors human research: holistic, nuanced, and context-driven.
References & Further Reading: