How to build a Interactive Personal AI Research Agent with Llama 3.2 : A Step-by-Step Guide using LangChain and Ollama

Sahin Ahmed, Data Scientist
13 min read2 days ago

--

Introduction

Building a research agent can be complex, but with LangChain and Ollama, it becomes a lot simpler and more modular. In this tutorial, we’ll show you how to create a research agent that can route queries, perform web searches, and generate detailed responses using a combination of workflows and LLMs. By the end, you’ll have a fully functional agent that can tackle a variety of information retrieval tasks!

Required Python libraries:

To build our research agent, we’ll be using Ollama for LLM interactions, LangChain for workflow management, LangGraph for defining workflow nodes, and the LangChain Community libraries for extended functionalities. Also for websearch, we will use duckduckgo-search.

Start by installing and configuring these tools.

Ollama Installation Steps:

To use Ollama in your system you need to install Ollama application in your system and then download the LLama 3.2 model in your System.

  • Download the installer from the Ollama official website.
  • Run the installer and follow the on-screen instructions to complete the setup. Ollama supports both MacOs and Windows.
  • After installation, you can use the terminal to run models:
  • Open a terminal.
  • Navigate to the directory where Ollama is installed.
  • Run the following command to list available models: ollama list.
  • To download and run a model, use: ollama pull <model_name> and ollama run <model_name>

You can install the other libraries using the following pip command in Python

!pip install langchain==0.2.12
!pip install langgraph==0.2.2
!pip install langchain-ollama==0.1.1
!pip install langsmith== 0.1.98
!pip install langchain_community​==0.2.11
!pip install duckduckgo-search==6.2.13

Importing required libraries:

# Displaying final output format
from IPython.display import display, Markdown, Latex
# LangChain Dependencies
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser, StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_community.utilities import DuckDuckGoSearchAPIWrapper
from langgraph.graph import END, StateGraph
# For State Graph
from typing_extensions import TypedDict
import os

Define the LLM model to be used:

# Defining LLM
local_llm = 'llama3.2'
llama3 = ChatOllama(model=local_llm, temperature=0)

Define the Web Search tool

# Web Search Tool

wrapper = DuckDuckGoSearchAPIWrapper(max_results=25)
web_search_tool = DuckDuckGoSearchRun(api_wrapper=wrapper)

You can test this tool using the following command:

# Test Run
resp = web_search_tool.invoke("current Weather in New York")
print(resp)

Example output:
"East wind around 8 mph. Partly sunny, with a high near 69. Northeast wind around 8 mph. Mostly cloudy, with a low around 59. East wind 3 to 6 mph. Mostly sunny, with a high near 73. Light and variable wind becoming south 5 to 7 mph in the afternoon. Mostly clear, with a low around 60

Define the response generation prompt for the LLM:

# Generation Prompt

generate_prompt = PromptTemplate(
template="""

<|begin_of_text|>

<|start_header_id|>system<|end_header_id|>

You are an AI assistant for Research Question Tasks, that synthesizes web search results.
Strictly use the following pieces of web search context to answer the question. If you don't know the answer, just say that you don't know.
keep the answer concise, but provide all of the details you can in the form of a research report.
Only make direct references to material if provided in the context.

<|eot_id|>

<|start_header_id|>user<|end_header_id|>

Question: {question}
Web Search Context: {context}
Answer:

<|eot_id|>

<|start_header_id|>assistant<|end_header_id|>""",
input_variables=["question", "context"],
)

# Chain
generate_chain = generate_prompt | llama3 | StrOutputParser()

# Test Run
question = "who is Yan Lecun?"
context = ""
generation = generate_chain.invoke({"context": context, "question": question})
print(generation)

Generation Prompt Template (generate_prompt):

  • This prompt is used for the generation stage of the agent.
  • It creates a template where a question ({question}) and the context ({context}) from web search are dynamically inserted.
  • The template instructs the AI assistant to:
  • Synthesize information from the given web search context.
  • Keep the answer concise yet detailed, similar to a research report.
  • If the required information is not present in the provided context, simply say “I don’t know”.

Chain Configuration (generate_chain):

  • The generate_chain is formed by chaining the generate_prompt with a model (e.g., llama3) and a string output parser (StrOutputParser).
  • The string output parser ensures that the final response is returned in a plain text format.

The code defines the generation phase for the research agent, where it answers user queries using web search context or responds with "I don't know" when the context is empty. This structure ensures accurate and context-driven responses in a concise report format

Define the Router Pompt:

# Router

router_prompt = PromptTemplate(
template="""

<|begin_of_text|>

<|start_header_id|>system<|end_header_id|>

You are an expert at routing a user question to either the generation stage or web search.
Use the web search for questions that require more context for a better answer, or recent events.
Otherwise, you can skip and go straight to the generation phase to respond.
You do not need to be stringent with the keywords in the question related to these topics.
Give a binary choice 'web_search' or 'generate' based on the question.
Return the JSON with a single key 'choice' with no premable or explanation.

Question to route: {question}

<|eot_id|>

<|start_header_id|>assistant<|end_header_id|>

""",
input_variables=["question"],
)

# Chain
question_router = router_prompt | llama3_json | JsonOutputParser()

# Test Run
question = "What's up?"
print(question_router.invoke({"question": question}))

Router Prompt Template (router_prompt):

  • This section creates a prompt template using the PromptTemplate class.
  • The template is a formatted text that will be filled in dynamically with a specific question.
  • It instructs the assistant on when to use web search (for recent events or context-heavy questions) versus when to generate a response (for general or context-independent queries).
  • It emphasizes returning a binary choice of either web_search or generate in JSON format.

Chain Configuration (question_router):

  • The question_router uses the router_prompt as input and chains it with a llama3_json model and a JsonOutputParser.
  • The pipeline accepts a question, processes it using the specified LLM (e.g., LLaMA), and parses the result into a JSON output with a single key: "choice" with either web_search or generate as value.

Define the web search Query transformer prompt:

# Query Transformation

query_prompt = PromptTemplate(
template="""

<|begin_of_text|>

<|start_header_id|>system<|end_header_id|>

You are an expert at crafting web search queries for research questions.
More often than not, a user will ask a basic question that they wish to learn more about, however it might not be in the best format.
Reword their query to be the most effective web search string possible.
Return the JSON with a single key 'query' with no premable or explanation.

Question to transform: {question}

<|eot_id|>

<|start_header_id|>assistant<|end_header_id|>

""",
input_variables=["question"],
)

# Chain
query_chain = query_prompt | llama3_json | JsonOutputParser()

# Test Run
question = "What's happened recently with Gaza?"
print(query_chain.invoke({"question": question}))

Query Transformation Prompt (query_prompt):

  • This prompt is specifically designed for optimizing web search queries.
  • It transforms a user’s casual or basic question into an effective web search string that can retrieve more relevant information.
  • The LLM is instructed to rephrase the question into a more search-engine-friendly format.
  • The response is returned as a JSON with a single key: query, without any additional text.

Chain Configuration (query_chain):

  • The query_chain combines the query_prompt with a language model (llama3_json) and a JsonOutputParser.
  • This setup allows the prompt to transform the query and return the output in JSON format.

Defining modular research agent workflow using a graph state

# Graph State
class GraphState(TypedDict):
"""
Represents the state of our graph.

Attributes:
question: question
generation: LLM generation
search_query: revised question for web search
context: web_search result
"""
question : str
generation : str
search_query : str
context : str

# Node - Generate

def generate(state):
"""
Generate answer

Args:
state (dict): The current graph state

Returns:
state (dict): New key added to state, generation, that contains LLM generation
"""

print("Step: Generating Final Response")
question = state["question"]
context = state["context"]

# Answer Generation
generation = generate_chain.invoke({"context": context, "question": question})
return {"generation": generation}

# Node - Query Transformation

def transform_query(state):
"""
Transform user question to web search

Args:
state (dict): The current graph state

Returns:
state (dict): Appended search query
"""

print("Step: Optimizing Query for Web Search")
question = state['question']
gen_query = query_chain.invoke({"question": question})
search_query = gen_query["query"]
return {"search_query": search_query}


# Node - Web Search

def web_search(state):
"""
Web search based on the question

Args:
state (dict): The current graph state

Returns:
state (dict): Appended web results to context
"""

search_query = state['search_query']
print(f'Step: Searching the Web for: "{search_query}"')

# Web search tool call
search_result = web_search_tool.invoke(search_query)
return {"context": search_result}


# Conditional Edge, Routing

def route_question(state):
"""
route question to web search or generation.

Args:
state (dict): The current graph state

Returns:
str: Next node to call
"""

print("Step: Routing Query")
question = state['question']
output = question_router.invoke({"question": question})
if output['choice'] == "web_search":
print("Step: Routing Query to Web Search")
return "websearch"
elif output['choice'] == 'generate':
print("Step: Routing Query to Generation")
return "generate"

GraphState Class:

  • GraphState is a TypedDict that defines the structure of the state dictionary used in the graph workflow.
  • It includes four keys:
  • question: Original user question.
  • generation: Final answer generated by the LLM.
  • search_query: Optimized question for web search.
  • context: Web search results used for generating responses.

Generate Node (generate):

  • This function generates the final response using the LLM based on the question and context.
  • It outputs the generation key in the graph state.
  • Prints the message: Step: Generating Final Response to indicate its operation.

Query Transformation Node (transform_query):

  • This node optimizes the user’s question for web search.
  • Calls the query_chain and outputs a search_query key in the state.
  • Prints the message: Step: Optimizing Query for Web Search.

Web Search Node (web_search):

  • This node performs a web search using the optimized query (search_query).
  • Uses a web search tool (web_search_tool.invoke) and outputs the context key in the state.
  • Prints the search query and message: Step: Searching the Web for: "<search_query>".

Conditional Routing (route_question):

  • This function routes the question to either web_search or generate based on the decision made by the question_router.
  • Prints the message: Step: Routing Query.
  • If the choice is web_search, it routes to the web_search node and prints Routing Query to Web Search.
  • If the choice is generate, it routes to the generate node and prints Routing Query to Generation.

The code snippet defines the core nodes and decision-making logic for a modular research agent workflow using a graph state. Each function represents a distinct step in the workflow, modifying the shared state and routing the question through appropriate nodes, depending on the requirements. The final output is a well-structured response generated using context, if available, or directly through LLM generation.

Building the state-based workflow using StateGraph

# Build the nodes
workflow = StateGraph(GraphState)
workflow.add_node("websearch", web_search)
workflow.add_node("transform_query", transform_query)
workflow.add_node("generate", generate)

# Build the edges
workflow.set_conditional_entry_point(
route_question,
{
"websearch": "transform_query",
"generate": "generate",
},
)
workflow.add_edge("transform_query", "websearch")
workflow.add_edge("websearch", "generate")
workflow.add_edge("generate", END)

# Compile the workflow
local_agent = workflow.compile()

Create a StateGraph Instance (workflow):

  • StateGraph is used to represent the workflow for the research agent.
  • The workflow is initialized with the GraphState type, ensuring that all nodes adhere to the defined state structure.

Add Nodes to the Graph:

  • The following nodes are added to the workflow:
  • "websearch": Executes the web_search function to perform web searches.
  • "transform_query": Executes the transform_query function to optimize user questions for web search.
  • "generate": Executes the generate function to create the final response using LLM.

Set the Conditional Entry Point (workflow.set_conditional_entry_point):

  • Defines the entry point of the workflow using the route_question function.
  • It routes the question to one of two nodes based on its decision:
  • "websearch": If the query needs more context.
  • "generate": If no web search is necessary.

Define Workflow Edges:

  • The edges define the flow between nodes:
  • "transform_query" -> "websearch": After transforming the query, move to web search.
  • "websearch" -> "generate": Once the web search is done, pass the results to the generation node.
  • "generate" -> END: The workflow ends after generating the final response.

Compile the Workflow:

  • The workflow.compile() step creates a compiled agent (local_agent) that can be invoked to handle queries according to the defined workflow.
  • local_agent will execute the nodes and route queries automatically based on the defined logic.

The code builds a state-based workflow using StateGraph, with nodes for query transformation, web search, and response generation. The workflow automatically routes through these nodes based on predefined conditions, ensuring that each query is processed in the most efficient way to produce high-quality answers.

Define the function to Run the Agent:

def run_agent(query):
output = local_agent.invoke({"question": query})
print("=======")
display(Markdown(output["generation"]))

Testing the Agent:

Example 1:


run_agent("What is Latest news About Open AI?")

Output:

Example 2:

run_agent("what is Transformer in AI?")

Output:

Building an Streamlit App to Run the Agent:

Steps to Run the Streamlit App:

1. Set Up the Environment

Before running the Streamlit app, make sure you have all the required libraries installed.

  • Open your terminal and install streamlit
!pip install streamlit 
  • If you have a requirements.txt file, you can also install dependencies using:
pip install -r requirements.txt

2. Create a Python Script File

  • Create a new file named streamlit_app.py (or use any other name you prefer) and paste the full Streamlit code provided earlier into it.

use the following code:

# Displaying final output format
from IPython.display import display, Markdown, Latex
# LangChain Dependencies
from langchain.prompts import PromptTemplate
from langchain_core.output_parsers import JsonOutputParser, StrOutputParser
from langchain_community.chat_models import ChatOllama
from langchain_community.tools import DuckDuckGoSearchRun
from langchain_community.utilities import DuckDuckGoSearchAPIWrapper
from langgraph.graph import END, StateGraph
# For State Graph
from typing_extensions import TypedDict
import streamlit as st
import os
# Defining LLM
def configure_llm():
st.sidebar.header("Configure LLM")

# Model Selection
model_options = ["llama3.2"]
selected_model = st.sidebar.selectbox("Choose the LLM Model", options=model_options, index=0)

# Temperature Setting
temperature = st.sidebar.slider("Set the Temperature", min_value=0.0, max_value=1.0, value=0.5, step=0.1)

# Create LLM Instances based on user selection
llama_model = ChatOllama(model=selected_model, temperature=temperature)
llama_model_json = ChatOllama(model=selected_model, format='json', temperature=temperature)

return llama_model, llama_model_json

# Streamlit Application Interface
st.title("Personal Research Assistant powered By Llama3.2")
llama3, llama3_json=configure_llm()
wrapper = DuckDuckGoSearchAPIWrapper(max_results=25)
web_search_tool = DuckDuckGoSearchRun(api_wrapper=wrapper)
generate_prompt = PromptTemplate(
template="""

<|begin_of_text|>

<|start_header_id|>system<|end_header_id|>

You are an AI assistant for Research Question Tasks, that synthesizes web search results.
Strictly use the following pieces of web search context to answer the question. If you don't know the answer, just say that you don't know.
keep the answer concise, but provide all of the details you can in the form of a research report.
Only make direct references to material if provided in the context.

<|eot_id|>

<|start_header_id|>user<|end_header_id|>

Question: {question}
Web Search Context: {context}
Answer:

<|eot_id|>

<|start_header_id|>assistant<|end_header_id|>""",
input_variables=["question", "context"],
)

# Chain
generate_chain = generate_prompt | llama3 | StrOutputParser()
router_prompt = PromptTemplate(
template="""

<|begin_of_text|>

<|start_header_id|>system<|end_header_id|>

You are an expert at routing a user question to either the generation stage or web search.
Use the web search for questions that require more context for a better answer, or recent events.
Otherwise, you can skip and go straight to the generation phase to respond.
You do not need to be stringent with the keywords in the question related to these topics.
Give a binary choice 'web_search' or 'generate' based on the question.
Return the JSON with a single key 'choice' with no premable or explanation.

Question to route: {question}

<|eot_id|>

<|start_header_id|>assistant<|end_header_id|>

""",
input_variables=["question"],
)

# Chain
question_router = router_prompt | llama3_json | JsonOutputParser()

query_prompt = PromptTemplate(
template="""

<|begin_of_text|>

<|start_header_id|>system<|end_header_id|>

You are an expert at crafting web search queries for research questions.
More often than not, a user will ask a basic question that they wish to learn more about, however it might not be in the best format.
Reword their query to be the most effective web search string possible.
Return the JSON with a single key 'query' with no premable or explanation.

Question to transform: {question}

<|eot_id|>

<|start_header_id|>assistant<|end_header_id|>

""",
input_variables=["question"],
)

# Chain
query_chain = query_prompt | llama3_json | JsonOutputParser()
# Graph State
class GraphState(TypedDict):
"""
Represents the state of our graph.

Attributes:
question: question
generation: LLM generation
search_query: revised question for web search
context: web_search result
"""
question : str
generation : str
search_query : str
context : str

# Node - Generate

def generate(state):
"""
Generate answer

Args:
state (dict): The current graph state

Returns:
state (dict): New key added to state, generation, that contains LLM generation
"""

print("Step: Generating Final Response")
question = state["question"]
context = state["context"]

# Answer Generation
generation = generate_chain.invoke({"context": context, "question": question})
return {"generation": generation}

# Node - Query Transformation

def transform_query(state):
"""
Transform user question to web search

Args:
state (dict): The current graph state

Returns:
state (dict): Appended search query
"""

print("Step: Optimizing Query for Web Search")
question = state['question']
gen_query = query_chain.invoke({"question": question})
search_query = gen_query["query"]
return {"search_query": search_query}


# Node - Web Search

def web_search(state):
"""
Web search based on the question

Args:
state (dict): The current graph state

Returns:
state (dict): Appended web results to context
"""

search_query = state['search_query']
print(f'Step: Searching the Web for: "{search_query}"')

# Web search tool call
search_result = web_search_tool.invoke(search_query)
return {"context": search_result}


# Conditional Edge, Routing

def route_question(state):
"""
route question to web search or generation.

Args:
state (dict): The current graph state

Returns:
str: Next node to call
"""

print("Step: Routing Query")
question = state['question']
output = question_router.invoke({"question": question})
if output['choice'] == "web_search":
print("Step: Routing Query to Web Search")
return "websearch"
elif output['choice'] == 'generate':
print("Step: Routing Query to Generation")
return "generate"
# Build the nodes
workflow = StateGraph(GraphState)
workflow.add_node("websearch", web_search)
workflow.add_node("transform_query", transform_query)
workflow.add_node("generate", generate)

# Build the edges
workflow.set_conditional_entry_point(
route_question,
{
"websearch": "transform_query",
"generate": "generate",
},
)
workflow.add_edge("transform_query", "websearch")
workflow.add_edge("websearch", "generate")
workflow.add_edge("generate", END)

# Compile the workflow
local_agent = workflow.compile()
def run_agent(query):
output = local_agent.invoke({"question": query})
print("=======")

return output["generation"]
user_query = st.text_input("Enter your research question:", "")

if st.button("Run Query"):
if user_query:
st.write(run_agent(user_query))

3. Save the Code

  • Save the script file in your working directory. Make sure it’s named correctly, for example: streamlit_app.py.

4. Run the Streamlit Application

  • Open your terminal in the same directory where you saved streamlit_app.py.
  • Run the Streamlit app with the following command:
streamlit run streamlit_app.py

5. Open the Streamlit Interface

  • Once the app starts running, a new tab will open in your default web browser, showing the Streamlit interface.
  • Alternatively, you can open your browser manually and go to the address displayed in the terminal, typically something like:
http://localhost:8501/

6. Configure the LLM Model

  • Use the sidebar options to configure the LLM model:
  • Select the model (remeber the model must be first downloaded in your system from ollama website)
  • Set the temperature using the slider.

7. Enter Your Query

  • In the main interface, you will see a text input box labeled "Enter your research question:".
  • Type your question into the box.

8. Run the Query

  • Click on the “Run Query” button.
  • The app will process your query using the configured workflow and display the results.

9. View the Output

  • The final answer will be displayed automatically.

Conclusion

Creating a research agent using LangChain and Streamlit showcases the power of combining modular workflows with interactive applications. By leveraging state-based decision nodes and dynamic LLM configurations, we’ve built a tool capable of routing queries, performing web searches, and generating concise research reports. This setup not only simplifies the development of complex agents but also provides flexibility for future enhancements. As AI continues to evolve, integrating such frameworks will be essential for building more intuitive and context-aware applications. So, give it a try, experiment with different models, and see how your research agent can be customized for various use cases!

--

--

Sahin Ahmed, Data Scientist

Data Scientist | MSc Data science|Lifelong Learner | Making an Impact through Data Science | Machine Learning| Deep Learning |NLP| Statistical Modeling