Model Context Protocol(MCP) with Ollama and Llama 3 : A Full Deep Dive + Working Code — Part 2
In my previous article , we explored Model Context Protocol (MCP) — a standardized way for LLMs to invoke tools or APIs with structured inputs. MCP also enables decoupled, modular interactions between LLMs and external services. We also learnt how MCP acts as a bridge between reasoning engines (like LLMs) and function executors (like custom APIs).
In this article (Part 2), we dive into a fully working implementation using:
- Ollama + LLaMA 3 for local reasoning
- MCP tools exposed via a FastMCP server
- SerpAPI for fetching real-time flight data
- LlamaIndex ReActAgent for intelligent tool use
- An emoji-enhanced, natural language experience
Objective
To build a local AI-powered flight search assistant using MCP and ReAct agent that can:
- Understand user queries in natural language
- Dynamically call an MCP server
- Fetch live data from the web
- Respond in human-friendly text
While Part 1 was focused on the “why” and “what” of MCP , this article is all about the “how” — with actual, working code .
- Build a working flight search MCP server
- Run an LLM client using Ollama and LLaMA 3 using Natural Language
- Invoke real-time flight queries using SerpAPI
- Leveraging SerpAPI for actual flight search results and LLM to process response
Core Architecture Components
1. MCP Protocol Implementation
- FastMCP server (from mcp.server.fastmcp) forms the foundation
- Implements Anthropic’s Model Context Protocol specification
- Supports multiple connection types: HTTP/SSE and stdio
- Tool registration system with automatic schema generation
2. Service Layer
- Search Service (search_service.py):
- Handles request validation and normalization
- Coordinates external API calls
- Formats response data for consistent output
- SerpAPI Client (serpapi_client.py):
- Encapsulates all SerpAPI interaction
- Uses asyncio.to_thread for non-blocking API calls
3. Data Models
- Pydantic schemas in models/schemas.py for type safety
- Structured response formatting for flight data
- Input parameter validation and normalization
Integration Points
- LangChain Integration:
- MultiServerMCPClient adapter for LangChain compatibility
- ReAct agent implementation using Gemini model
- Conversation handling and streaming support
2. External API:
- SerpAPI for Google Flights data
- Configurable for multiple MCP servers (extensible design)
Implementation
Let us deep dive into setting up MCP Server first
Step 1 : MCP Service Setup — Pre-requisites
Install all pre-requisites
pip nest_asyncio llama-index llama-index-tools-mcp llama-index-llms-ollama google-search-results
Initialize FastMCP Server and configure to run at Port 3001
FastMCP framework provides a standardized way to expose tools/endpoints that other services can easily consume like how this creates a microservice architecture for AI/ML tools.
from mcp.server.fastmcp import FastMCP
import asyncio
from serpapi import GoogleSearch
# Initialize MCP server
mcp = FastMCP("FlightSearchService", port=3001)
Step 2 : Define Pydantic Model
For Strucutred input and output
# Pydantic Models
class FlightInfo(BaseModel):
airline: str
price: str
duration: str
stops: str
departure: str
arrival: str
Step 3 : SerpAPI Integration
Below service run_search(params) handles external API calls asynchronously.
- Use of asyncio.to_thread() to prevent blocking the event loop
- Error handling with structured responses
- Detailed logging for debugging
async def run_search(params):
"""Run SerpAPI search asynchronously"""
try:
logger.debug(f"Sending SerpAPI request with params: {json.dumps(params, indent=2)}")
result = await asyncio.to_thread(lambda: GoogleSearch(params).get_dict())
logger.debug(f"SerpAPI response received, keys: {list(result.keys())}")
return result
except Exception as e:
logger.exception(f"SerpAPI search error: {str(e)}")
return {"error": str(e)}
Step 4 : Defining MCP Tool using decorator
@mcp.tool() Decorator is a custom decorator provided by the MCP framework. It registers this function as an MCPtool, so the server knows it can expose this function via /schema and /invoke
Automatically extracts the function signature (parameters, descriptions, etc.) and makes it available for the MCP client (LLM agent) to understand.When the LLM sees this function in the schema, it knows it can call it with structured inputs.
Make this function callable by an AI agent using MCP.
@mcp.tool()
async def search_flights(origin: str, destination: str, outbound_date: str, return_date: Optional[str] = None):
# Prepare search parameters
params = {
"api_key": SERP_API_KEY,
"engine": "google_flights",
"hl": "en",
"gl": "us",
"departure_id": origin.strip().upper(),
"arrival_id": destination.strip().upper(),
"outbound_date": outbound_date,
"currency": "USD",
"type": "2"
}
search_results = await run_search(params)
params — parameters are defined for the SerpAPI flight search.
Step 5 : MCP Server Configuration and Execution
Below section shows how the MCP supports different communication methods:
- HTTP/SSE for web-based interfaces
- STDIO for command-line or direct process communication
- Command-line argument parsing for flexible deployment
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(description="MCP Flight Search Service")
parser.add_argument("--connection_type", type=str, default="http", choices=["http", "stdio"])
args = parser.parse_args()
server_type = "sse" if args.connection_type == "http" else "stdio"
print(f"Starting Flight Search Service on port 3001 with {args.connection_type} connection")
mcp.run(server_type)
Step 6 : Start the MCP Server in multiple ways
- Clone the github repository https://github.com/arjunprabhulal/mcp-flight-search and install. dpendancies and start the server
# Install from Github
git clone https://github.com/arjunprabhulal/mcp-flight-search
# Or install from the project directory (development mode)
pip install -e .
# Start the Server
python main.py
2. Install package which I published to PyPi https://pypi.org/project/mcp-flight-search/
pip install mcp-flight-search
Start the Service after installation
Step 7 : MCP Inspector — Verify MCP Server Visually
As mentioned in part 1 — The MCP Inspector is a developer tool that allows you to interact with and debug MCP (Model Configuration Protocol) services like the flight search server in your code.
While not directly shown in your code sample, the MCP Inspector typically provides:
- A visual interface to view all available tools/endpoints exposed by your MCP server
- Documentation generated from your function docstrings and type hints
- The ability to execute tool calls directly and see responses
- Request/response inspection for debugging
- Testing capabilities for your MCP services
When we run an MCP server like yours with HTTP/SSE connection type, the Inspector is usually accessible via a web browser at our server’s address (e.g., http://localhost:3001). It helps developers understand what tools are available and test them without needing to build a full client application.
The Inspector would show both your search_flights and server_status tools, along with their parameters and documentation.
Pre-Requisites
Install required mcp cli packages
pip install 'mcp[cli]'
MCP Inspector — Debug MCP Server
Now , that our MCP server up and ready .
Let us start creating Client using Ollama + Llama3.2 and how to call MCP Server. Step by Step breakdown that integrates
- Ollama with LLaMA 3
- MCP tools
- LlamaIndex ReActAgent
- Natural language interface for flight search
Step 1 : Install Required Packages
pip install llama-index llama-index-llms-ollama llama-index-tools-mcp
- llama-index: Core framework for LLM orchestration
- llama-index-llms-ollama: Ollama integration for local LLMs
- llama-index-tools-mcp: tool connects to MCP Servers
Step 2: Import All Dependencies
import asyncio
import nest_asyncio
import json
import sys
from datetime import datetime
from llama_index.tools.mcp import BasicMCPClient, McpToolSpec
from llama_index.core.agent.workflow import ReActAgent
from llama_index.llms.ollama import Ollama
- BasicMCPClient: Connects to your MCP server
- McpToolSpec: Fetches available tools from the server
- ReActAgent: Orchestrates LLM + tool reasoning
- Ollama: Allows using LLaMA 3 models locally
Step 3 : MCP Server URL
MCP_URL = "http://127.0.0.1:3001/sse"
MCP_URL points to the MCP Server running flight server that implements the MCP protocol. The /sse endpoint supports Server-Sent Events, enabling real-time updates.
Step 4: Setting up LLM Agent
MCP_URL = "http://127.0.0.1:3001/sse"
async def setup_agent():
# Setup MCP and fetch tools from flight server
mcp_client = BasicMCPClient(MCP_URL)
tools = await McpToolSpec(client=mcp_client).to_tool_list_async()
# Initialize LLM and agent with Ollama
llm = Ollama(model="llama3.2", temperature=0.7)
This code connects to the flight MCP server we saw earlier and automatically discovers available tools (search_flights and server_status). It uses the Ollama LLM to provide natural language understanding.
- Talks to your mcp-flight-search server
- Pulls available tools (like search_flights)
- Converts them into callable objects for the agent
Step 5: Install and Initialize Ollama LLM
Installing Ollama
This client uses Ollama to run Llama 3.2 locally. To install Ollama:
- Download Ollama from the official website
- Install and start the Ollama application
- Pull the Llama 3.2 model:
ollama pull llama3.2
llm = Ollama(model="llama3.2", temperature=0.7)
Uses the local LLaMA 3 model via Ollama as the reasoning engine.
Step 6 : Construct the agent
The ReActAgent from LlamaIndex lets the LLM:
• Reason about user intent
• Choose when to call tools (MCP functions)
• Respond naturally
agent = ReActAgent(
name="FlightAgent",
llm=llm,
tools=tools,
description="Agent using MCP flight search tools with natural language understanding",
system_prompt=system_prompt,
temperature=0.2,
verbose=False
)
Step 6: Defining Custom Prompt
Below System Prompt is where OLLAMA converts Natural Prompt into MCP server specific format and send across MCP server for response
system_prompt = """
You are a helpful flight search assistant. Today is """ + datetime.now().strftime("%B %d, %Y") + """.
When searching for flights, please follow these guidelines:
1. Convert city or airport names to their standard 3-letter IATA airport codes
2. Common examples:
- Atlanta = ATL
- New York = JFK (or LGA/EWR depending on context)
...
"""
- Teaches the agent to convert cities to IATA airport codes i.e Convert city names to airport codes (e.g., “New York” → “JFK”)
- Clarifies expected format for one-way and round-trip searches
- Format dates properly (e.g., “next week” → actual date)
- Structure flight search results with emojis and consistent formatting
Step 7 : Start the client
python mcp_flight_client.py
Overall UserFlow
- User enters natural language flight query
- LLM interprets the query using the ReAct agent framework
- Agent calls the appropriate MCP tools from the server
- Server calls SerpAPI to fetch real flight data
- Data flows back to client through MCP
- LLM formats the response with emojis and clean structure
- User sees nicely formatted flight options
This creates a simple but powerful natural language interface to the flight search service without the client needing to understand the underlying API structure.
Final Demo MCP Server + Ollama + Llama
Github repository
Below are complete github code used for above demo
MCP Flight Search Server [https://pypi.org/project/mcp-flight-search/]
Conclusion
In this two-part series, we’ve gone from theory to real-world application — turning the Model Context Protocol (MCP) into a functional, intelligent flight search system using:
- A minimal MCP server that integrates SerpAPI for live flight data
- A conversational LLM agent built with Ollama + LLaMA 3
- The MCP client layer that acts as a bridge between language understanding and real-world action
- The ReActAgent from LlamaIndex to reason, invoke tools, and return helpful responses
- A rich, emoji-enhanced natural language chat experience for the end user
This project also highlights how we can build real, agentic applications using only open-source tools, local models, and lightweight protocols — no cloud LLMs or API lock-ins required.
By separating the reasoning layer (LLM) from the execution layer (MCP tools), We gain flexibility, debuggability, and modularity — a future-proof architecture for building intelligent agents that can safely call APIs, query systems, or act in the world.
Whether you’re building a travel assistant, a DevOps co-pilot, or a research bot, the MCP pattern gives you a clean, explainable interface between language models and real functionality.