AI Cloud Lab

Exploring AI, Cloud, LLM Frameworks , MLOps and Open Source Tools into hands-on and easy-to-follow tutorials.

Model Context Protocol(MCP) with Ollama and Llama 3 : A Full Deep Dive + Working Code — Part 2

--

In my previous article , we explored Model Context Protocol (MCP) — a standardized way for LLMs to invoke tools or APIs with structured inputs. MCP also enables decoupled, modular interactions between LLMs and external services. We also learnt how MCP acts as a bridge between reasoning engines (like LLMs) and function executors (like custom APIs).

In this article (Part 2), we dive into a fully working implementation using:

  • Ollama + LLaMA 3 for local reasoning
  • MCP tools exposed via a FastMCP server
  • SerpAPI for fetching real-time flight data
  • LlamaIndex ReActAgent for intelligent tool use
  • An emoji-enhanced, natural language experience

Objective

To build a local AI-powered flight search assistant using MCP and ReAct agent that can:

  • Understand user queries in natural language
  • Dynamically call an MCP server
  • Fetch live data from the web
  • Respond in human-friendly text

While Part 1 was focused on the “why” and “what” of MCP , this article is all about the “how” — with actual, working code .

  1. Build a working flight search MCP server
  2. Run an LLM client using Ollama and LLaMA 3 using Natural Language
  3. Invoke real-time flight queries using SerpAPI
  4. Leveraging SerpAPI for actual flight search results and LLM to process response
Image by Author

Core Architecture Components

1. MCP Protocol Implementation

  • FastMCP server (from mcp.server.fastmcp) forms the foundation
  • Implements Anthropic’s Model Context Protocol specification
  • Supports multiple connection types: HTTP/SSE and stdio
  • Tool registration system with automatic schema generation

2. Service Layer

  • Search Service (search_service.py):
  • Handles request validation and normalization
  • Coordinates external API calls
  • Formats response data for consistent output
  • SerpAPI Client (serpapi_client.py):
  • Encapsulates all SerpAPI interaction
  • Uses asyncio.to_thread for non-blocking API calls

3. Data Models

  • Pydantic schemas in models/schemas.py for type safety
  • Structured response formatting for flight data
  • Input parameter validation and normalization

Integration Points

  1. LangChain Integration:
  • MultiServerMCPClient adapter for LangChain compatibility
  • ReAct agent implementation using Gemini model
  • Conversation handling and streaming support

2. External API:

  • SerpAPI for Google Flights data
  • Configurable for multiple MCP servers (extensible design)

Implementation

Let us deep dive into setting up MCP Server first

Step 1 : MCP Service Setup — Pre-requisites

Install all pre-requisites

pip nest_asyncio llama-index llama-index-tools-mcp llama-index-llms-ollama google-search-results

Initialize FastMCP Server and configure to run at Port 3001

FastMCP framework provides a standardized way to expose tools/endpoints that other services can easily consume like how this creates a microservice architecture for AI/ML tools.

from mcp.server.fastmcp import FastMCP
import asyncio
from serpapi import GoogleSearch

# Initialize MCP server
mcp = FastMCP("FlightSearchService", port=3001)

Step 2 : Define Pydantic Model

For Strucutred input and output

# Pydantic Models
class FlightInfo(BaseModel):
airline: str
price: str
duration: str
stops: str
departure: str
arrival: str

Step 3 : SerpAPI Integration

Below service run_search(params) handles external API calls asynchronously.

  • Use of asyncio.to_thread() to prevent blocking the event loop
  • Error handling with structured responses
  • Detailed logging for debugging
async def run_search(params):
"""Run SerpAPI search asynchronously"""
try:
logger.debug(f"Sending SerpAPI request with params: {json.dumps(params, indent=2)}")
result = await asyncio.to_thread(lambda: GoogleSearch(params).get_dict())
logger.debug(f"SerpAPI response received, keys: {list(result.keys())}")
return result
except Exception as e:
logger.exception(f"SerpAPI search error: {str(e)}")
return {"error": str(e)}

Step 4 : Defining MCP Tool using decorator

@mcp.tool() Decorator is a custom decorator provided by the MCP framework. It registers this function as an MCPtool, so the server knows it can expose this function via /schema and /invoke

Automatically extracts the function signature (parameters, descriptions, etc.) and makes it available for the MCP client (LLM agent) to understand.When the LLM sees this function in the schema, it knows it can call it with structured inputs.

Make this function callable by an AI agent using MCP.

@mcp.tool()
async def search_flights(origin: str, destination: str, outbound_date: str, return_date: Optional[str] = None):
# Prepare search parameters
params = {
"api_key": SERP_API_KEY,
"engine": "google_flights",
"hl": "en",
"gl": "us",
"departure_id": origin.strip().upper(),
"arrival_id": destination.strip().upper(),
"outbound_date": outbound_date,
"currency": "USD",
"type": "2"
}

search_results = await run_search(params)

params — parameters are defined for the SerpAPI flight search.

Step 5 : MCP Server Configuration and Execution

Below section shows how the MCP supports different communication methods:

  • HTTP/SSE for web-based interfaces
  • STDIO for command-line or direct process communication
  • Command-line argument parsing for flexible deployment
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(description="MCP Flight Search Service")
parser.add_argument("--connection_type", type=str, default="http", choices=["http", "stdio"])
args = parser.parse_args()

server_type = "sse" if args.connection_type == "http" else "stdio"
print(f"Starting Flight Search Service on port 3001 with {args.connection_type} connection")

mcp.run(server_type)

Step 6 : Start the MCP Server in multiple ways

  1. Clone the github repository https://github.com/arjunprabhulal/mcp-flight-search and install. dpendancies and start the server
# Install from Github
git clone https://github.com/arjunprabhulal/mcp-flight-search

# Or install from the project directory (development mode)
pip install -e .

# Start the Server
python main.py

2. Install package which I published to PyPi https://pypi.org/project/mcp-flight-search/

pip install mcp-flight-search

Start the Service after installation

Step 7 : MCP Inspector — Verify MCP Server Visually

As mentioned in part 1 — The MCP Inspector is a developer tool that allows you to interact with and debug MCP (Model Configuration Protocol) services like the flight search server in your code.

While not directly shown in your code sample, the MCP Inspector typically provides:

  1. A visual interface to view all available tools/endpoints exposed by your MCP server
  2. Documentation generated from your function docstrings and type hints
  3. The ability to execute tool calls directly and see responses
  4. Request/response inspection for debugging
  5. Testing capabilities for your MCP services

When we run an MCP server like yours with HTTP/SSE connection type, the Inspector is usually accessible via a web browser at our server’s address (e.g., http://localhost:3001). It helps developers understand what tools are available and test them without needing to build a full client application.

The Inspector would show both your search_flights and server_status tools, along with their parameters and documentation.

Pre-Requisites

Install required mcp cli packages

pip install 'mcp[cli]'

MCP Inspector — Debug MCP Server

Now , that our MCP server up and ready .

Let us start creating Client using Ollama + Llama3.2 and how to call MCP Server. Step by Step breakdown that integrates

  • Ollama with LLaMA 3
  • MCP tools
  • LlamaIndex ReActAgent
  • Natural language interface for flight search

Step 1 : Install Required Packages

pip install llama-index llama-index-llms-ollama llama-index-tools-mcp
  • llama-index: Core framework for LLM orchestration
  • llama-index-llms-ollama: Ollama integration for local LLMs
  • llama-index-tools-mcp: tool connects to MCP Servers

Step 2: Import All Dependencies

import asyncio
import nest_asyncio
import json
import sys
from datetime import datetime
from llama_index.tools.mcp import BasicMCPClient, McpToolSpec
from llama_index.core.agent.workflow import ReActAgent
from llama_index.llms.ollama import Ollama
  • BasicMCPClient: Connects to your MCP server
  • McpToolSpec: Fetches available tools from the server
  • ReActAgent: Orchestrates LLM + tool reasoning
  • Ollama: Allows using LLaMA 3 models locally

Step 3 : MCP Server URL

MCP_URL = "http://127.0.0.1:3001/sse"

MCP_URL points to the MCP Server running flight server that implements the MCP protocol. The /sse endpoint supports Server-Sent Events, enabling real-time updates.

Step 4: Setting up LLM Agent

MCP_URL = "http://127.0.0.1:3001/sse"

async def setup_agent():
# Setup MCP and fetch tools from flight server
mcp_client = BasicMCPClient(MCP_URL)
tools = await McpToolSpec(client=mcp_client).to_tool_list_async()

# Initialize LLM and agent with Ollama
llm = Ollama(model="llama3.2", temperature=0.7)

This code connects to the flight MCP server we saw earlier and automatically discovers available tools (search_flights and server_status). It uses the Ollama LLM to provide natural language understanding.

  • Talks to your mcp-flight-search server
  • Pulls available tools (like search_flights)
  • Converts them into callable objects for the agent

Step 5: Install and Initialize Ollama LLM

Installing Ollama

This client uses Ollama to run Llama 3.2 locally. To install Ollama:

  1. Download Ollama from the official website
  2. Install and start the Ollama application
  3. Pull the Llama 3.2 model:
ollama pull llama3.2
llm = Ollama(model="llama3.2", temperature=0.7)

Uses the local LLaMA 3 model via Ollama as the reasoning engine.

Step 6 : Construct the agent

The ReActAgent from LlamaIndex lets the LLM:

• Reason about user intent

• Choose when to call tools (MCP functions)

• Respond naturally

        
agent = ReActAgent(
name="FlightAgent",
llm=llm,
tools=tools,
description="Agent using MCP flight search tools with natural language understanding",
system_prompt=system_prompt,
temperature=0.2,
verbose=False
)

Step 6: Defining Custom Prompt

Below System Prompt is where OLLAMA converts Natural Prompt into MCP server specific format and send across MCP server for response

system_prompt = """
You are a helpful flight search assistant. Today is """ + datetime.now().strftime("%B %d, %Y") + """.

When searching for flights, please follow these guidelines:

1. Convert city or airport names to their standard 3-letter IATA airport codes
2. Common examples:
- Atlanta = ATL
- New York = JFK (or LGA/EWR depending on context)
...
"""
  • Teaches the agent to convert cities to IATA airport codes i.e Convert city names to airport codes (e.g., “New York” → “JFK”)
  • Clarifies expected format for one-way and round-trip searches
  • Format dates properly (e.g., “next week” → actual date)
  • Structure flight search results with emojis and consistent formatting

Step 7 : Start the client

python mcp_flight_client.py

Overall UserFlow

  1. User enters natural language flight query
  2. LLM interprets the query using the ReAct agent framework
  3. Agent calls the appropriate MCP tools from the server
  4. Server calls SerpAPI to fetch real flight data
  5. Data flows back to client through MCP
  6. LLM formats the response with emojis and clean structure
  7. User sees nicely formatted flight options

This creates a simple but powerful natural language interface to the flight search service without the client needing to understand the underlying API structure.

Final Demo MCP Server + Ollama + Llama

Github repository

Below are complete github code used for above demo

MCP Flight Search Server [https://pypi.org/project/mcp-flight-search/]

MCP Client with Ollama

Conclusion

In this two-part series, we’ve gone from theory to real-world application — turning the Model Context Protocol (MCP) into a functional, intelligent flight search system using:

  • A minimal MCP server that integrates SerpAPI for live flight data
  • A conversational LLM agent built with Ollama + LLaMA 3
  • The MCP client layer that acts as a bridge between language understanding and real-world action
  • The ReActAgent from LlamaIndex to reason, invoke tools, and return helpful responses
  • A rich, emoji-enhanced natural language chat experience for the end user

This project also highlights how we can build real, agentic applications using only open-source tools, local models, and lightweight protocols — no cloud LLMs or API lock-ins required.

By separating the reasoning layer (LLM) from the execution layer (MCP tools), We gain flexibility, debuggability, and modularity — a future-proof architecture for building intelligent agents that can safely call APIs, query systems, or act in the world.

Whether you’re building a travel assistant, a DevOps co-pilot, or a research bot, the MCP pattern gives you a clean, explainable interface between language models and real functionality.

--

--

AI Cloud Lab
AI Cloud Lab

Published in AI Cloud Lab

Exploring AI, Cloud, LLM Frameworks , MLOps and Open Source Tools into hands-on and easy-to-follow tutorials.

Arjun Prabhulal
Arjun Prabhulal

Written by Arjun Prabhulal

Explore AI/ML and Open Source tools and breakdown into simple, practical guides so that anyone can follow

Responses (2)