How to build a Interactive Personal AI Research Agent with Llama 3.2 : A Step-by-Step Guide using LangChain and Ollama

13 min read2 days ago

Introduction

Building a research agent can be complex, but with LangChain and Ollama, it becomes a lot simpler and more modular. In this tutorial, we’ll show you how to create a research agent that can route queries, perform web searches, and generate detailed responses using a combination of workflows and LLMs. By the end, you’ll have a fully functional agent that can tackle a variety of information retrieval tasks!

Required Python libraries:

To build our research agent, we’ll be using Ollama for LLM interactions, LangChain for workflow management, LangGraph for defining workflow nodes, and the LangChain Community libraries for extended functionalities. Also for websearch, we will use duckduckgo-search.

Start by installing and configuring these tools.

Ollama Installation Steps:

To use Ollama in your system you need to install Ollama application in your system and then download the LLama 3.2 model in your System.

Download the installer from the Ollama official website.
Run the installer and follow the on-screen instructions to complete the setup. Ollama supports both MacOs and Windows.
After installation, you can use the terminal to run models:
Open a terminal.
Navigate to the directory where Ollama is installed.
Run the following command to list available models: ollama list.
To download and run a model, use: ollama pull <model_name> and ollama run <model_name>

You can install the other libraries using the following pip command in Python

!pip install langchain==0.2.12
!pip install langgraph==0.2.2
!pip install langchain-ollama==0.1.1
!pip install langsmith== 0.1.98
!pip install langchain_community==0.2.11
!pip install duckduckgo-search==6.2.13

Importing required libraries:

# Displaying final output format
from IPython.display import display, Markdown, Latex
# LangChain Dependencies
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser, StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_community.utilities import DuckDuckGoSearchAPIWrapper
from langgraph.graph import END, StateGraph
# For State Graph 
from typing_extensions import TypedDict
import os

Define the LLM model to be used:

# Defining LLM
local_llm = 'llama3.2'
llama3 = ChatOllama(model=local_llm, temperature=0)

Define the Web Search tool

# Web Search Tool

wrapper = DuckDuckGoSearchAPIWrapper(max_results=25)
web_search_tool = DuckDuckGoSearchRun(api_wrapper=wrapper)

You can test this tool using the following command:

# Test Run
resp = web_search_tool.invoke("current Weather in New York")
print(resp)

Example output:
"East wind around 8 mph. Partly sunny, with a high near 69. Northeast wind around 8 mph. Mostly cloudy, with a low around 59. East wind 3 to 6 mph. Mostly sunny, with a high near 73. Light and variable wind becoming south 5 to 7 mph in the afternoon. Mostly clear, with a low around 60

Define the response generation prompt for the LLM:

# Generation Prompt

generate_prompt = PromptTemplate(
    template="""
    
    <|begin_of_text|>
    
    <|start_header_id|>system<|end_header_id|> 
    
    You are an AI assistant for Research Question Tasks, that synthesizes web search results. 
    Strictly use the following pieces of web search context to answer the question. If you don't know the answer, just say that you don't know. 
    keep the answer concise, but provide all of the details you can in the form of a research report. 
    Only make direct references to material if provided in the context.
    
    <|eot_id|>
    
    <|start_header_id|>user<|end_header_id|>
    
    Question: {question} 
    Web Search Context: {context} 
    Answer: 
    
    <|eot_id|>
    
    <|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["question", "context"],
)

# Chain
generate_chain = generate_prompt | llama3 | StrOutputParser()

# Test Run
question = "who is Yan Lecun?"
context = ""
generation = generate_chain.invoke({"context": context, "question": question})
print(generation)

Generation Prompt Template (generate_prompt):

This prompt is used for the generation stage of the agent.
It creates a template where a question ({question}) and the context ({context}) from web search are dynamically inserted.
The template instructs the AI assistant to:
Synthesize information from the given web search context.
Keep the answer concise yet detailed, similar to a research report.
If the required information is not present in the provided context, simply say “I don’t know”.

Chain Configuration (generate_chain):

The generate_chain is formed by chaining the generate_prompt with a model (e.g., llama3) and a string output parser (StrOutputParser).
The string output parser ensures that the final response is returned in a plain text format.

The code defines the generation phase for the research agent, where it answers user queries using web search context or responds with "I don't know" when the context is empty. This structure ensures accurate and context-driven responses in a concise report format

Define the Router Pompt:

# Router

router_prompt = PromptTemplate(
    template="""
    
    <|begin_of_text|>
    
    <|start_header_id|>system<|end_header_id|>
    
    You are an expert at routing a user question to either the generation stage or web search. 
    Use the web search for questions that require more context for a better answer, or recent events.
    Otherwise, you can skip and go straight to the generation phase to respond.
    You do not need to be stringent with the keywords in the question related to these topics.
    Give a binary choice 'web_search' or 'generate' based on the question. 
    Return the JSON with a single key 'choice' with no premable or explanation. 
    
    Question to route: {question} 
    
    <|eot_id|>
    
    <|start_header_id|>assistant<|end_header_id|>
    
    """,
    input_variables=["question"],
)

# Chain
question_router = router_prompt | llama3_json | JsonOutputParser()

# Test Run
question = "What's up?"
print(question_router.invoke({"question": question}))

Router Prompt Template (router_prompt):

This section creates a prompt template using the PromptTemplate class.
The template is a formatted text that will be filled in dynamically with a specific question.
It instructs the assistant on when to use web search (for recent events or context-heavy questions) versus when to generate a response (for general or context-independent queries).
It emphasizes returning a binary choice of either web_search or generate in JSON format.

Chain Configuration (question_router):

The question_router uses the router_prompt as input and chains it with a llama3_json model and a JsonOutputParser.
The pipeline accepts a question, processes it using the specified LLM (e.g., LLaMA), and parses the result into a JSON output with a single key: "choice" with either web_search or generate as value.

Define the web search Query transformer prompt:

# Query Transformation

query_prompt = PromptTemplate(
    template="""
    
    <|begin_of_text|>
    
    <|start_header_id|>system<|end_header_id|> 
    
    You are an expert at crafting web search queries for research questions.
    More often than not, a user will ask a basic question that they wish to learn more about, however it might not be in the best format. 
    Reword their query to be the most effective web search string possible.
    Return the JSON with a single key 'query' with no premable or explanation. 
    
    Question to transform: {question} 
    
    <|eot_id|>
    
    <|start_header_id|>assistant<|end_header_id|>
    
    """,
    input_variables=["question"],
)

# Chain
query_chain = query_prompt | llama3_json | JsonOutputParser()

# Test Run
question = "What's happened recently with Gaza?"
print(query_chain.invoke({"question": question}))

Query Transformation Prompt (query_prompt):

This prompt is specifically designed for optimizing web search queries.
It transforms a user’s casual or basic question into an effective web search string that can retrieve more relevant information.
The LLM is instructed to rephrase the question into a more search-engine-friendly format.
The response is returned as a JSON with a single key: query, without any additional text.

Chain Configuration (query_chain):

The query_chain combines the query_prompt with a language model (llama3_json) and a JsonOutputParser.
This setup allows the prompt to transform the query and return the output in JSON format.

Defining modular research agent workflow using a graph state

# Graph State
class GraphState(TypedDict):
    """
    Represents the state of our graph.

    Attributes:
        question: question
        generation: LLM generation
        search_query: revised question for web search
        context: web_search result
    """
    question : str
    generation : str
    search_query : str
    context : str

# Node - Generate

def generate(state):
    """
    Generate answer

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, generation, that contains LLM generation
    """
    
    print("Step: Generating Final Response")
    question = state["question"]
    context = state["context"]

    # Answer Generation
    generation = generate_chain.invoke({"context": context, "question": question})
    return {"generation": generation}

# Node - Query Transformation

def transform_query(state):
    """
    Transform user question to web search

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Appended search query
    """
    
    print("Step: Optimizing Query for Web Search")
    question = state['question']
    gen_query = query_chain.invoke({"question": question})
    search_query = gen_query["query"]
    return {"search_query": search_query}


# Node - Web Search

def web_search(state):
    """
    Web search based on the question

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Appended web results to context
    """

    search_query = state['search_query']
    print(f'Step: Searching the Web for: "{search_query}"')
    
    # Web search tool call
    search_result = web_search_tool.invoke(search_query)
    return {"context": search_result}


# Conditional Edge, Routing

def route_question(state):
    """
    route question to web search or generation.

    Args:
        state (dict): The current graph state

    Returns:
        str: Next node to call
    """

    print("Step: Routing Query")
    question = state['question']
    output = question_router.invoke({"question": question})
    if output['choice'] == "web_search":
        print("Step: Routing Query to Web Search")
        return "websearch"
    elif output['choice'] == 'generate':
        print("Step: Routing Query to Generation")
        return "generate"

GraphState Class:

GraphState is a TypedDict that defines the structure of the state dictionary used in the graph workflow.
It includes four keys:
question: Original user question.
generation: Final answer generated by the LLM.
search_query: Optimized question for web search.
context: Web search results used for generating responses.

Generate Node (generate):

This function generates the final response using the LLM based on the question and context.
It outputs the generation key in the graph state.
Prints the message: Step: Generating Final Response to indicate its operation.

Query Transformation Node (transform_query):

This node optimizes the user’s question for web search.
Calls the query_chain and outputs a search_query key in the state.
Prints the message: Step: Optimizing Query for Web Search.

Web Search Node (web_search):

This node performs a web search using the optimized query (search_query).
Uses a web search tool (web_search_tool.invoke) and outputs the context key in the state.
Prints the search query and message: Step: Searching the Web for: "<search_query>".

Conditional Routing (route_question):

This function routes the question to either web_search or generate based on the decision made by the question_router.
Prints the message: Step: Routing Query.
If the choice is web_search, it routes to the web_search node and prints Routing Query to Web Search.
If the choice is generate, it routes to the generate node and prints Routing Query to Generation.

The code snippet defines the core nodes and decision-making logic for a modular research agent workflow using a graph state. Each function represents a distinct step in the workflow, modifying the shared state and routing the question through appropriate nodes, depending on the requirements. The final output is a well-structured response generated using context, if available, or directly through LLM generation.

Building the state-based workflow using `StateGraph`

# Build the nodes
workflow = StateGraph(GraphState)
workflow.add_node("websearch", web_search)
workflow.add_node("transform_query", transform_query)
workflow.add_node("generate", generate)

# Build the edges
workflow.set_conditional_entry_point(
    route_question,
    {
        "websearch": "transform_query",
        "generate": "generate",
    },
)
workflow.add_edge("transform_query", "websearch")
workflow.add_edge("websearch", "generate")
workflow.add_edge("generate", END)

# Compile the workflow
local_agent = workflow.compile()

Create a StateGraph Instance (workflow):

StateGraph is used to represent the workflow for the research agent.
The workflow is initialized with the GraphState type, ensuring that all nodes adhere to the defined state structure.

Add Nodes to the Graph:

The following nodes are added to the workflow:
"websearch": Executes the web_search function to perform web searches.
"transform_query": Executes the transform_query function to optimize user questions for web search.
"generate": Executes the generate function to create the final response using LLM.

Set the Conditional Entry Point (workflow.set_conditional_entry_point):

Defines the entry point of the workflow using the route_question function.
It routes the question to one of two nodes based on its decision:
"websearch": If the query needs more context.
"generate": If no web search is necessary.

Define Workflow Edges:

The edges define the flow between nodes:
"transform_query" -> "websearch": After transforming the query, move to web search.
"websearch" -> "generate": Once the web search is done, pass the results to the generation node.
"generate" -> END: The workflow ends after generating the final response.

Compile the Workflow:

The workflow.compile() step creates a compiled agent (local_agent) that can be invoked to handle queries according to the defined workflow.
local_agent will execute the nodes and route queries automatically based on the defined logic.

The code builds a state-based workflow using StateGraph, with nodes for query transformation, web search, and response generation. The workflow automatically routes through these nodes based on predefined conditions, ensuring that each query is processed in the most efficient way to produce high-quality answers.

Define the function to Run the Agent:

def run_agent(query):
    output = local_agent.invoke({"question": query})
    print("=======")
    display(Markdown(output["generation"]))

Testing the Agent:

Example 1:


run_agent("What is Latest news About Open AI?")

Output:

Example 2:

run_agent("what is Transformer in AI?")

Output:

Building an Streamlit App to Run the Agent:

Steps to Run the Streamlit App:

1. Set Up the Environment

Before running the Streamlit app, make sure you have all the required libraries installed.

Open your terminal and install streamlit

!pip install streamlit

If you have a requirements.txt file, you can also install dependencies using:

pip install -r requirements.txt

2. Create a Python Script File

Create a new file named streamlit_app.py (or use any other name you prefer) and paste the full Streamlit code provided earlier into it.

use the following code:

# Displaying final output format
from IPython.display import display, Markdown, Latex
# LangChain Dependencies
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser, StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_community.utilities import DuckDuckGoSearchAPIWrapper
from langgraph.graph import END, StateGraph
# For State Graph 
from typing_extensions import TypedDict
import streamlit as st
import os
# Defining LLM
def configure_llm():
    st.sidebar.header("Configure LLM")
    
    # Model Selection
    model_options = ["llama3.2"]
    selected_model = st.sidebar.selectbox("Choose the LLM Model", options=model_options, index=0)
    
    # Temperature Setting
    temperature = st.sidebar.slider("Set the Temperature", min_value=0.0, max_value=1.0, value=0.5, step=0.1)
   
    # Create LLM Instances based on user selection
    llama_model = ChatOllama(model=selected_model, temperature=temperature)
    llama_model_json = ChatOllama(model=selected_model, format='json', temperature=temperature)
    
    return llama_model, llama_model_json

# Streamlit Application Interface
st.title("Personal Research Assistant powered By Llama3.2")
llama3, llama3_json=configure_llm()
wrapper = DuckDuckGoSearchAPIWrapper(max_results=25)
web_search_tool = DuckDuckGoSearchRun(api_wrapper=wrapper)
generate_prompt = PromptTemplate(
    template="""
    
    <|begin_of_text|>
    
    <|start_header_id|>system<|end_header_id|> 
    
    You are an AI assistant for Research Question Tasks, that synthesizes web search results. 
    Strictly use the following pieces of web search context to answer the question. If you don't know the answer, just say that you don't know. 
    keep the answer concise, but provide all of the details you can in the form of a research report. 
    Only make direct references to material if provided in the context.
    
    <|eot_id|>
    
    <|start_header_id|>user<|end_header_id|>
    
    Question: {question} 
    Web Search Context: {context} 
    Answer: 
    
    <|eot_id|>
    
    <|start_header_id|>assistant<|end_header_id|>""",
    input_variables=["question", "context"],
)

# Chain
generate_chain = generate_prompt | llama3 | StrOutputParser()
router_prompt = PromptTemplate(
    template="""
    
    <|begin_of_text|>
    
    <|start_header_id|>system<|end_header_id|>
    
    You are an expert at routing a user question to either the generation stage or web search. 
    Use the web search for questions that require more context for a better answer, or recent events.
    Otherwise, you can skip and go straight to the generation phase to respond.
    You do not need to be stringent with the keywords in the question related to these topics.
    Give a binary choice 'web_search' or 'generate' based on the question. 
    Return the JSON with a single key 'choice' with no premable or explanation. 
    
    Question to route: {question} 
    
    <|eot_id|>
    
    <|start_header_id|>assistant<|end_header_id|>
    
    """,
    input_variables=["question"],
)

# Chain
question_router = router_prompt | llama3_json | JsonOutputParser()

query_prompt = PromptTemplate(
    template="""
    
    <|begin_of_text|>
    
    <|start_header_id|>system<|end_header_id|> 
    
    You are an expert at crafting web search queries for research questions.
    More often than not, a user will ask a basic question that they wish to learn more about, however it might not be in the best format. 
    Reword their query to be the most effective web search string possible.
    Return the JSON with a single key 'query' with no premable or explanation. 
    
    Question to transform: {question} 
    
    <|eot_id|>
    
    <|start_header_id|>assistant<|end_header_id|>
    
    """,
    input_variables=["question"],
)

# Chain
query_chain = query_prompt | llama3_json | JsonOutputParser()
# Graph State
class GraphState(TypedDict):
    """
    Represents the state of our graph.

    Attributes:
        question: question
        generation: LLM generation
        search_query: revised question for web search
        context: web_search result
    """
    question : str
    generation : str
    search_query : str
    context : str

# Node - Generate

def generate(state):
    """
    Generate answer

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): New key added to state, generation, that contains LLM generation
    """
    
    print("Step: Generating Final Response")
    question = state["question"]
    context = state["context"]

    # Answer Generation
    generation = generate_chain.invoke({"context": context, "question": question})
    return {"generation": generation}

# Node - Query Transformation

def transform_query(state):
    """
    Transform user question to web search

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Appended search query
    """
    
    print("Step: Optimizing Query for Web Search")
    question = state['question']
    gen_query = query_chain.invoke({"question": question})
    search_query = gen_query["query"]
    return {"search_query": search_query}


# Node - Web Search

def web_search(state):
    """
    Web search based on the question

    Args:
        state (dict): The current graph state

    Returns:
        state (dict): Appended web results to context
    """

    search_query = state['search_query']
    print(f'Step: Searching the Web for: "{search_query}"')
    
    # Web search tool call
    search_result = web_search_tool.invoke(search_query)
    return {"context": search_result}


# Conditional Edge, Routing

def route_question(state):
    """
    route question to web search or generation.

    Args:
        state (dict): The current graph state

    Returns:
        str: Next node to call
    """

    print("Step: Routing Query")
    question = state['question']
    output = question_router.invoke({"question": question})
    if output['choice'] == "web_search":
        print("Step: Routing Query to Web Search")
        return "websearch"
    elif output['choice'] == 'generate':
        print("Step: Routing Query to Generation")
        return "generate"
# Build the nodes
workflow = StateGraph(GraphState)
workflow.add_node("websearch", web_search)
workflow.add_node("transform_query", transform_query)
workflow.add_node("generate", generate)

# Build the edges
workflow.set_conditional_entry_point(
    route_question,
    {
        "websearch": "transform_query",
        "generate": "generate",
    },
)
workflow.add_edge("transform_query", "websearch")
workflow.add_edge("websearch", "generate")
workflow.add_edge("generate", END)

# Compile the workflow
local_agent = workflow.compile()
def run_agent(query):
    output = local_agent.invoke({"question": query})
    print("=======")

    return output["generation"]
user_query = st.text_input("Enter your research question:", "")

if st.button("Run Query"):
    if user_query:
        st.write(run_agent(user_query))

3. Save the Code

Save the script file in your working directory. Make sure it’s named correctly, for example: streamlit_app.py.

4. Run the Streamlit Application

Open your terminal in the same directory where you saved streamlit_app.py.
Run the Streamlit app with the following command:

streamlit run streamlit_app.py

5. Open the Streamlit Interface

Once the app starts running, a new tab will open in your default web browser, showing the Streamlit interface.
Alternatively, you can open your browser manually and go to the address displayed in the terminal, typically something like:

http://localhost:8501/

6. Configure the LLM Model

Use the sidebar options to configure the LLM model:
Select the model (remeber the model must be first downloaded in your system from ollama website)
Set the temperature using the slider.

7. Enter Your Query

In the main interface, you will see a text input box labeled "Enter your research question:".
Type your question into the box.

8. Run the Query

Click on the “Run Query” button.
The app will process your query using the configured workflow and display the results.

9. View the Output

The final answer will be displayed automatically.

Conclusion

Creating a research agent using LangChain and Streamlit showcases the power of combining modular workflows with interactive applications. By leveraging state-based decision nodes and dynamic LLM configurations, we’ve built a tool capable of routing queries, performing web searches, and generating concise research reports. This setup not only simplifies the development of complex agents but also provides flexibility for future enhancements. As AI continues to evolve, integrating such frameworks will be essential for building more intuitive and context-aware applications. So, give it a try, experiment with different models, and see how your research agent can be customized for various use cases!

How to build a Interactive Personal AI Research Agent with Llama 3.2 : A Step-by-Step Guide using LangChain and Ollama

Introduction

Required Python libraries:

Ollama Installation Steps:

Importing required libraries:

Define the LLM model to be used:

Define the Web Search tool

You can test this tool using the following command:

Define the response generation prompt for the LLM:

Define the Router Pompt:

Define the web search Query transformer prompt:

Defining modular research agent workflow using a graph state

Building the state-based workflow using StateGraph

Define the function to Run the Agent:

Testing the Agent:

Example 1:

Output:

Example 2:

Output:

Building an Streamlit App to Run the Agent:

Steps to Run the Streamlit App:

1. Set Up the Environment

2. Create a Python Script File

3. Save the Code

4. Run the Streamlit Application

5. Open the Streamlit Interface

6. Configure the LLM Model

7. Enter Your Query

8. Run the Query

9. View the Output

Conclusion

Written by Sahin Ahmed, Data Scientist

Building the state-based workflow using `StateGraph`