Stories by Maddula Sampath Kumar on Medium

Introducing the Whitepaper on building Secure Multi-Agent Framework on Google Cloud

Maddula Sampath Kumar — Thu, 23 Apr 2026 07:35:33 GMT

Building agents is one thing. Building them to be enterprise-ready and secure is a completely different challenge. When agents start acting across systems and making decisions autonomously, your security strategy has to evolve from model-level guardrails to full-system defense-in-depth.

This new whitepaper breaks down how to build a secure ‘Warranty Claim System’ (a practical usecase for your better understanding and) using the Gemini Enterprise Agent Platform.

https://services.google.com/fh/files/events/agent_security.pdf

Here’s the blueprint for securing the Agentic future:

🏗️ Build with Intent
→ Agent Development Kit (ADK): Native support for session management and tool-level authentication.
→ Deterministic Callbacks: Using BeforeToolCallback to validate inputs (like serial numbers) before they ever hit your backend.
→ Hybrid Runtimes: Seamlessly bridging managed Agent Runtimes with custom Cloud Run environments.

🚀 Scale Safely
→ Identity-Centric Design: Every agent gets a unique, cryptographic Agent Identity (SPIFFE-backed). No more over-permissioned service accounts.
→ Human-in-the-Loop (HITL): Built-in confirmation primitives to pause high-stakes actions for explicit approval.
→ Memory Isolation: Ensuring long-term context is securely mapped and isolated per user session.

🛡️ Govern & Protect
→ Agent Gateway: The central control plane for all ingress and egress. It intercepts every call to authenticate and authorize in real-time.
→ Model Armor Integration: Automatically scrubbing PII and neutralizing prompt injections/jailbreaks inline.
→ Dual Guardrails: Combining IAM boundaries (Access Control) with Semantic Governance (Intent Control) to prevent “Shadow AI.”

📊 Observe & Defend
→ Chain-of-Thought Tracing: Using Cloud Trace to visualize why an agent made a decision, not just what it did.
→ Virtual Red-Teaming: Automated, AI-driven adversarial simulations to stress-test your boundaries before attackers do.

💬 Closing thoughts: “Its time to shift from building chatbots to “Secure Autonomous Worker”.

📚 Here are some resources to get you started!
→ AgentSecurity PDF: https://services.google.com/fh/files/events/agent_security.pdf
→ Google SAIF (Secure AI Framework): https://saif.google/secure-ai-framework/saif-map
→ Agent Identity & SPIFFE : https://docs.cloud.google.com/iam/docs/agent-identity-overview#spiffe-identity
→ Cloud Trace: https://docs.cloud.google.com/trace/docs/overview
→ Agent Gateway: https://docs.cloud.google.com/gemini-enterprise-agent-platform/govern/gateways/agent-gateway-overview

5 Science-Backed Takeaways to Improve Your Multi-Agentic Systems

Maddula Sampath Kumar — Thu, 26 Feb 2026 05:35:48 GMT

Gemini Pro

Repost from my Linkedin blog.

The AI industry has been riding the hype train of “more agents is all you need” — the intuitive idea that a swarm of specialized AI agents working together will automatically outperform a single AI model.

However, a landmark study testing 180 different AI agent configurations proves this wrong. Adding more agents to a problem does not magically improve performance; in fact, it can completely derail your system if not designed correctly.

If you are building or deploying AI agents, here are 5 science-backed takeaways to scale your systems effectively. (Alternative you can read the detailed 25-page report at https://arxiv.org/pdf/2512.08296.)

1. Beware the “45% Rule” (Capability Saturation)

It is tempting to throw a team of AI agents at every problem, but the research reveals a hard “capability ceiling.” If a single, standalone AI agent can already solve a task with at least 45% accuracy, adding more agents will likely give you diminishing — or even negative — returns.

The Takeaway: Multi-agent systems shine primarily on problems where a single model fails completely. If your baseline AI model is already relatively smart and capable of doing the job decently well, the extra cost and complexity of managing a multi-agent team will outweigh the benefits.

2. Mind the “Tool-Coordination” Trade-Off

When AI agents communicate with one another, they consume “tokens” (the computational memory and processing budget they use to read and generate text). The study found a direct conflict between team coordination and complex tool usage. If a task is heavily reliant on using tools (for example, writing code, running Python scripts, searching databases, or navigating complex software), having multiple agents chat back and forth eats up the cognitive budget needed to actually operate those tools.

The Takeaway: If your workflow requires the heavy use of external tools, a single, powerful agent is usually better. A team of agents will spend too much time organizing and communicating, leaving insufficient computing power to actually execute the work.

3. Never Let Agents Work in Isolation (The Error Cascade)

Not all multi-agent architectures are created equal, and some are downright dangerous for production environments. The study tested an “Independent” architecture, where multiple agents work on sub-tasks at the same time without talking to each other, and their final answers are simply combined. This approach amplified errors by a staggering 17.2x. Without a mechanism to check each other’s work, a hallucination or mistake made by one agent cascaded into the final output.

The Takeaway: If you use multiple agents, you must build in validation. Use a “Centralized” system where a “manager” or orchestrator agent reviews and synthesizes the work of the sub-agents. The study found that adding this managerial bottleneck successfully caught mistakes and contained error amplification.

4. Match Your Team Structure to the Task

The success of your AI agents depends entirely on the nature of the task you are asking them to do. The research identified exactly which architectures work for which problems:

Centralized Teams (Manager + Workers): Best for parallel tasks. If a task can be broken down into independent chunks (e.g., financial analysis where one agent checks revenue, another checks costs, and a third checks market trends), a centralized manager can synthesize the findings for massive performance gains (+80%).
Decentralized Teams (Peer-to-Peer): Best for dynamic exploration. If agents are navigating highly unpredictable environments (like browsing the open web for information), allowing them to talk directly to each other to reach a consensus works best.
Single Agents: Best for strictly sequential tasks. If Step B cannot happen until Step A is perfectly completed (like step-by-step logic planning), all multi-agent teams will fail and degrade performance. Stick to one agent.

The Takeaway: Architecture-task alignment is more important than the number of agents you deploy. Don’t use a swarm for a step-by-step logic puzzle, and don’t use a single agent for a massive, multi-part research project.

5. Watch Out for the “Coordination Tax”

More agents mean more communication overhead. The researchers discovered that the cost of agents talking to one another scales super-linearly. This means that as you add agents, the number of messages they send back and forth explodes.

The Takeaway: Keep your agent teams small. The study found that beyond 3 to 4 agents, the system hits a hard resource ceiling. The AI collective will spend all of its processing budget on “chatty” internal dialogue, synchronization, and task allocation rather than actually solving the user’s problem.

Conclusion: As foundational AI models get smarter, they don’t replace the need for multi-agent systems — but they do change how we should use them. By moving away from the “more is always better” guesswork and applying these scientific principles, you can build AI systems that are cheaper, safer, and significantly more reliable.

So tell, which of these 5 takeaways surprised you the most? Please share your experience and ideas in the comments.

5 Science-Backed Takeaways to Improve Your Multi-Agentic Systems was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

The Beads Memory System: Technical Architecture and Integration with Gemini CLI for Agentic…

Maddula Sampath Kumar — Sat, 14 Feb 2026 09:26:33 GMT

📿 The Beads Memory System: Technical Architecture and Integration with Gemini CLI for Agentic Workflows

Audio Track(10m): https://www.youtube.com/watch?v=O_UrPKy3Xu8

The landscape of artificial intelligence in software engineering is undergoing a fundamental shift from simple code assistance to autonomous agentic workflows. However, a persistent challenge remains in the inherent “amnesia” of large language models, frequently described as the “50 First Dates” problem. Agents often lose critical context between discrete sessions, resulting in redundant work or a total loss of project direction. Beads, a lightweight and git-backed issue tracker designed by Steve Yegge, provides a persistent and structured memory layer that travels with the source code, effectively upgrading the cognitive capacity of coding agents. For technical leadership and engineering teams utilizing the Gemini CLI, Beads serves as the central nervous system for long-horizon planning and execution.

Technical Philosophy and the Problem of Agentic Amnesia

The primary motivation for the development of Beads stems from the limitations of current agentic workflows that rely heavily on unstructured markdown files for planning. While agents are proficient at generating initial six-phase plans, they struggle to maintain the “implicit stack” of their decision-making process once work begins. As sessions progress, agents encounter edge cases, sub-tasks, and bugs that were not part of the original spec. In traditional markdown-based workflows, these discoveries are often lost or create conflicting documents, leading to what is described as “agentic dementia”.

Beads addresses this by replacing verbose and ambiguous prose with a machine-readable dependency-aware graph. This shift from “parsing prose” to “querying structured data” reduces the cognitive load on the agent, allowing it to spend more tokens on implementation and less on orienting itself. The core philosophy is that an agent should never have to guess what is next; the system should provide a deterministic and unblocked queue of work.

Comparisons of Issue Tracking Paradigms

The following table highlights the distinctions between Beads and traditional human-centric trackers or markdown-based methods often used in early AI workflows.

The Architectural Blueprint: SQLite, JSONL, and Git

The architecture of Beads is a distributed and serverless system that balances performance with portability. It avoids the latency of centralized cloud services by utilizing the local filesystem and the existing git infrastructure of the project. The system employs a three-tier storage strategy to ensure data integrity across multi-agent and multi-branch environments.

Local Performance via SQLite

At the execution layer, Beads utilizes a SQLite database located at .beads/beads.db. This database acts as a high-speed queryable cache, enabling the bd CLI to calculate complex dependency traversals and filter tasks quickly. Because the SQLite binary is machine-specific, it is typically excluded from version control via .gitignore, serving strictly as a local performance optimization.

Portability and Source of Truth via JSONL

The primary source of truth for synchronization is the .beads/issues.jsonl file. Storing issue data in a line-delimited JSON format ensures that each task is a single entry, making it exceptionally safe for git-based merging. When an agent or human updates a task, the change is written to the SQLite database and then exported to the JSONL file. This format is both human-readable for debugging and machine-parseable for agents that may not have direct access to a database driver.

Synchronization and the Background Daemon

To manage the interaction between these layers, Beads runs a background daemon. This process uses Unix Socket RPC (or loopback TCP on Windows) to communicate with the CLI tool. The daemon provides several critical functions:

Automatic Import/Export: It watches the filesystem for changes to the JSONL file and automatically hydrates the local SQLite database.
Debounced Writes: To reduce disk I/O and git noise, the daemon batches updates and exports to JSONL with a 30-second debounce period.
Concurrency Management: It handles file locking and coordinates access in environments where multiple agents may be attempting to write to the database simultaneously.

Installation and Initialization for Development Environments

Beads is a system-wide utility rather than a project-level dependency. It is installed once and utilized across all repositories on a developer’s machine.

Installation Protocols

The installation process varies slightly depending on the operating system but is generally streamlined via script or package manager:

macOS and Linux (Homebrew): The command brew install beads is the recommended method.
Cross-Platform (npm): For Node.js environments, npm install -g @beads/bd provides a global installation.
Windows (PowerShell): Installation is achieved via a dedicated PowerShell script.
Go-Native: Developers can use go install github.com/steveyegge/beads/cmd/bd@latest to compile the binary directly.

Project Bootstrapping

Once the binary is available, a project is initialized by running bd init in the root directory(Gemini CLI can do it for you). This command creates the necessary .beads metadata folder and prompts for the user's role—either maintainer or contributor.

Technical leads should be aware of the “Stealth Mode” feature. Running bd init --stealth allows a developer to use Beads for personal task tracking on a shared project without committing any .beads files to the repository. This is particularly useful for individual contributors on legacy codebases.

The Data Model: Hierarchy, Priority, and Graph Edges

Beads is built on a dependency-aware graph that dictates the execution order of software development tasks.

Issue Hierarchy and Naming Conventions

The system supports hierarchical task management through dot-delimited IDs, allowing for clear parent-child relationships:

Epic (e.g., app-a1b2): A high-level feature or project milestone.
Task (e.g., app-a1b2.1): A discrete piece of work required for the epic.
Sub-task (e.g., app-a1b2.1.1): A granular technical step or fix.

The use of hash-based IDs prevents collisions when multiple agents work in parallel on different git branches.

Semantic Dependency Types

The relationships between issues are the “edges” of the graph. Beads utilizes four specific types:

Blocks: A strict prerequisite where Task A must be closed before Task B is “ready”.
Parent-Child: A hierarchical link where an epic blocks its children until initial planning is complete.
Related: A context-only link providing relevant technical history.
Discovered-from: A critical audit trail link. When an agent finds a bug while implementing a feature, it links the new issue to the current task, capturing the “provenance” of the work.

The Ready Engine and Prioritization

The bd ready command is the core mechanism for agent orientation. It programmatically filters the database to return only the tasks that are open and have zero open blockers. Tasks are further sorted by a 5-level priority scale (critical through backlog), eliminating the need for constant human triage.

Integration with Gemini CLI and MCP

For users of the Gemini CLI, the integration of Beads is essential for transforming the tool into a stateful coding partner.

MCP Server Implementation

The beads-mcp server acts as the interface between the Gemini CLI and the local bd tools. Developers can configure this by adding the Beads server to their Gemini CLI configuration (usually in ~/.gemini/settings.json). A standard configuration entry involves defining the transport mechanism—typically stdio—and pointing to the beads-mcp executable. Once connected, the Gemini agent can invoke Beads tools natively, such as calling beads__ready() to find unblocked work.

The Gemini Beads Extension

Alternatively, the gemini-beads extension can be installed to provide a tailored experience within the terminal. This extension wraps the bd binary and provides a GEMINI.md context file that instructs the agent on specific workflow requirements, such as how to claim tasks.

Context Injection via AGENTS.md

A key component of integration is briefing the agent on the tracking system. Adding a single line to an AGENTS.md or CLAUDE.md file—such as "Use 'bd' for task tracking"—is often the minimum requirement to activate planning capabilities. Advanced setups use bd prime to inject a state summary directly into the agent's initial context window.

Strategic Workflows: From Design to “Landing the Plane”

Effective use of Beads necessitates a disciplined workflow that separates planning from execution.

Phased Implementation Strategy

Iterative Planning: Developer and agent collaborate on a high-level design (e.g., feature_spec.md).
Filing Issues: The agent translates the design into Beads epics and issues.
Dependency Alignment: Agent creates graph edges (e.g., migrations blocking functional updates).
Focused Execution: Agent tackles one bd ready task at a time. Between tasks, the agent process is restarted with Beads serving as the "external RAM".

The “Landing the Plane” Hygiene Protocol

At the end of every session, the developer must instruct the agent to “land the plane”. This mandatory cleanup ensures the repository and task tracker remain synced:

File Follow-ups: Create issues for work discovered but not addressed.
Quality Gates: Run all project tests and linters.
Update States: Close completed issues and add progress notes.
Synchronize: Export the database to JSONL and stage changes.
Push to Remote: Perform a git pull --rebase followed by a git push.
Handoff: Generate a prompt for the next session to minimize orientation time.

Advanced Maintenance and System Diagnostics

As a project grows, technical leads must ensure the health of the Beads system using the bd doctor utility. This tool identifies broken git hooks, schema migrations, and circular dependencies.

Compaction and Semantic Memory Decay

When history accumulates, the bd compact command implements a graceful “memory decay.” Details of old closed tasks are summarized or removed from the active tracking file to improve agent speed, while the full history remains accessible in the git repository.

Multi-Agent Coordination and Concurrency

In “Agent Village” scenarios, Beads integrates with tools like beads-village to provide:

Atomic Claiming: Using bd update --claim to prevent multiple agents from starting the same task.
File Locking: Reserving specific files before beginning an edit.
Messaging: Using special issue types for asynchronous agent communication.

Community Tooling and Extensibility

The Beads ecosystem includes several community-built interfaces:

beads_viewer (Go): A high-performance terminal interface with tree navigation.
bsv (Rust): A two-panel TUI for rapid triage.
beads-kanban-ui: A web-based Kanban board.
jira-beads-sync: Middleware for institutional project management synchronization.

Conclusion

The Beads memory system offers a robust, git-native solution for the persistent challenge of agentic amnesia. By utilizing a hybrid storage model of SQLite and JSONL, it provides a distributed framework for task management that aligns with modern version control. For teams using the Gemini CLI, this combination enables a new level of autonomous coding capability and project continuity.

📿 The Beads Memory System: Technical Architecture and Integration with Gemini CLI for Agentic… was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

The Six Essential Protocols Powering the AI Agent Ecosystem

Maddula Sampath Kumar — Tue, 27 Jan 2026 08:19:00 GMT

#AIAgents #Protocols #AIDevelopment #MCP #A2A #AgentEconomy #AIIntegration

TL;DR: Six protocols are emerging as the foundation of AI agent architecture: MCP (tool access), A2A (agent collaboration), AG-UI (UI communication), A2UI (UI generation), AP2 (payments), and UCP (commerce). Together, they enable agents to interact seamlessly with tools, other agents, users, and commerce systems. Understanding these protocols today positions you at the forefront of the AI agent revolution.

Intro

As AI agents become increasingly sophisticated, the need for standardized communication protocols has never been more critical. Just as the internet was built on protocols like HTTP and TCP/IP, the emerging AI agent ecosystem is being shaped by a new generation of protocols that enable seamless interaction between agents, tools, users, and commerce systems.

Having worked extensively with AI agents over the past year, I’ve identified six foundational protocols that are shaping how agents interact with the world around them. Each protocol addresses a specific challenge in the agent ecosystem, and together they form the backbone of what I believe will be the standard architecture for AI agent applications.

The Protocol Stack: From Tools to Commerce

1. MCP: Model Context Protocol (Agent 🤖 to Tools ⚙️)

Purpose: Connects agents to external tools and data sources seamlessly.

Key Concept: “One connector. Every tool.”

The Model Context Protocol solves a fundamental challenge: how do agents interact with the thousands of tools and data sources available today? Rather than building custom integrations for each tool, MCP provides a unified interface that allows agents to connect to databases, APIs, files, calendars, calculators, and virtually any other tool through a single protocol.

Why it matters: Without MCP, every agent would need custom code for every tool integration. With MCP, agents can instantly access any MCP-compatible tool, dramatically reducing development time and expanding capabilities.

2. A2A: Agent-to-Agent Protocol (Agent 🤖 to Agent 🤖)

Purpose: Enables agents to discover, coordinate with, and delegate tasks to other agents.

Key Actions: Discover, Coordinate, Delegate.

The A2A Protocol addresses the reality that no single agent can do everything. In complex workflows, agents need to collaborate. For example, an Inventory Agent might need to work with a Research Agent to understand market trends and a Pricing Agent to set optimal prices.

Real-world scenario: Imagine a supply chain optimization task where multiple specialized agents (logistics, inventory, demand forecasting) need to coordinate to minimize costs while maintaining service levels.

3. AG-UI: Agent-to-Frontend Protocol (Agent 🤖 to Frontend 📱)

Purpose: Manages the critical interaction layer between agents and user interfaces.

Key Functions: Streaming state updates, handling approval workflows, maintaining synchronization.

AG-UI an open, lightweight, event-based protocol that standardizes how AI agents connect to user-facing applications. AG-UI is designed to be the general-purpose, bi-directional connection between a user-facing application and any agentic backend.

Example: When an agent needs to access sensitive data or perform a high-stakes action, AG-UI manages the approval flow, presenting the request to the user and waiting for confirmation before proceeding.

4. A2UI: Agent-to-User Interface (Agent 🤖 to User Interface 📱)

Purpose: Empowers agents to create dynamic, context-appropriate user interface elements.

Key Concept: Moving beyond plain text to rich, interactive experiences.

A2UI is an open standard and set of libraries that allows agents to “speak UI.” Agents send a declarative JSON format describing the intent of the UI. The client application then renders this using its own native component library (Flutter, Angular, Lit, etc.)

Benefit: This creates a more intuitive and efficient user experience. For instance, instead of asking users to type out complex configuration options, an agent can generate a form with appropriate validation and helpful tooltips.

AG-UI vs A2UI: Think of A2UI as the blueprint for UI widgets and AG-UI as the secure communication channel moving them between the AI (backend) and your app (frontend).

5. AP2: Agent Payment Protocol (Agent 🤖 to Payment Interface 💳)

Purpose: Enables secure, auditable financial transactions for autonomous agents.

Key Features: Cryptographic proof at every step, granular permission controls.

The AP2 Protocol tackles one of the most challenging aspects of autonomous agents: handling money. It implements a three-stage mandate system:

Intent Mandate: What the agent is authorized to do (e.g., “Buy shoes under $150”)
Cart Mandate: What the agent has selected
Payment Mandate: The actual transaction execution

Security focus: Every step is cryptographically signed and auditable, ensuring agents can only act within their authorized boundaries.

6. UCP: Universal Commerce Protocol (Agent 🤖 to Commerce 🛍)

Purpose: Standardizes commerce interactions across different retailers and platforms.

Key Concept: “One standard. Every retailer.”

UCP solves the fragmentation problem in e-commerce. Instead of agents needing to understand dozens of different checkout flows, product schemas, and API formats, UCP provides a unified interface for product search, cart management, checkout, and delivery tracking across any compatible retailer. (UCP is built on top of AP2.)

Impact: This enables true shopping agent experiences where a single agent can help users find and purchase products from multiple stores without custom integrations for each retailer.

The Bigger Picture: An Integrated Ecosystem

These six protocols aren’t just independent standards — they work together to create a complete ecosystem:

Real-World Impact: How These Protocols Transform Daily Work

Let me share a practical example. My friend runs a small marketing agency and recently started using AI agents. Here’s what changed:

Before protocols: She had separate tools for project management (Asana), analytics (Google Analytics), client communication (Slack), and invoicing (QuickBooks). Her AI assistant could only help with basic tasks in isolation.

After MCP adoption: Her AI assistant now connects to all these tools through a single protocol. When a client emails about campaign performance, the assistant automatically:

Pulls the latest analytics data
Checks project status in Asana
Drafts a comprehensive update email with actual metrics
Suggests next steps based on the data

The game-changer? She didn’t need to build custom integrations for each tool. The MCP-compatible assistant just worked with everything out of the box.

The multiplier effect with A2A: When she needs deeper analysis, her main assistant delegates to specialized agents — one for competitive analysis, another for content optimization. They collaborate behind the scenes and return unified recommendations.

This isn’t some far-off vision. It’s happening now. The protocols are removing the friction that kept AI assistants from being truly useful in real business workflows.

Looking Ahead: The Protocol-First Future

As someone who’s been building with AI agents, I’m convinced that these protocols — or something very similar — will become as fundamental to AI applications as REST APIs are to web services today. The teams building on these protocols now are positioning themselves at the forefront of the AI agent revolution.

A question for the community: Which of these protocols do you see as most critical for your use cases? Are there gaps that need addressing? I’d love to hear about your experiences building agent-based systems and the integration challenges you’ve faced.

#AIAgents #Protocols #AIDevelopment #MCP #A2A #AgentEconomy #AIIntegration

Originally published at https://www.linkedin.com.

The Six Essential Protocols Powering the AI Agent Ecosystem was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Master Your Workflow: Top Gemini CLI Commands You Should Know | Gemini CLI CheatSheet

Maddula Sampath Kumar — Tue, 14 Oct 2025 10:06:00 GMT

Gemini CLI

Picture this: you’re deep in your coding workflow. You need to refactor a complex function, write a clear and concise commit message for your latest changes, and then figure out an obscure shell command to deploy your work. Each task requires switching context — moving from your editor to a browser, then to your Git client, and back again. It’s a constant juggling act that breaks your focus.

What if you could do all of this from one place, your terminal? The Gemini CLI brings the power of Google’s state-of-the-art AI directly to your command line, acting as a seamless, context-aware pair programmer. It’s designed to understand your project, streamline your tasks, and keep you in the flow.

🚀 Getting Started: Your First Steps

Before you can master your workflow, you need to get set up and oriented. These commands are the foundation.

/init 🎬: This is where your journey should begin. Running /init in your project directory allows Gemini to analyze your codebase and create a GEMINI.md file.
/about ℹ️: Curious about your setup? This command quickly displays your Gemini CLI version, the underlying model being used, and your current authentication method.
/auth 🔑: Security and access are paramount. The /auth command lets you configure how you authenticate with Google's AI services.
/help & /docs 📚: The /help command is your go-to for a quick overview. For more details, /docs opens the full documentation in your browser.
/quit 👋: When you’re done for the day, simply use /quit to exit the Gemini CLI application.

🛠️ Core Workflow Commands

These are the commands you’ll use day-to-day to interact with Gemini and your codebase.

! (Shell Commands) 🐚: This is one of the most powerful features. Press SHIFT + 1 to enter "shell mode," allowing you to execute shell commands or describe them in natural language.
/tools 🧰: Ever wonder what capabilities Gemini has? The /tools command lists all available tools Gemini can use to help you.
/editor ✍️: Pro-Tip: For complex, multi-line prompts, use /editor to set your preferred external editor (like Vim or VS Code), then use Ctrl+X to open it.

💬 Managing Your Conversations

A conversation with Gemini is a valuable asset. Here’s how to manage it effectively.

/chat (save, resume, list, delete, share) 💾: The /chat command is a powerful tool for managing your session history. save a conversation, resume it later, or share it as a file.
/clear 🧹: Need a fresh start? /clear will wipe the screen and your current conversation history.
/compress 🧠: When a conversation gets long, /compress intelligently summarizes the context to keep things focused.

🎨 Customization and Stats

Tailor the CLI to your liking and keep an eye on your usage.

/theme 🖌️: Customize your experience. The /theme command allows you to change the look and feel of the Gemini CLI.
/stats 📊: Curious about your usage? /stats provides statistics for your current session, including model and tool usage.

⌨️ Gemini CLI Keyboard Shortcuts You Should Know

To work even faster, it’s worth learning a few essential keyboard shortcuts.

ESC: Cancel a long-running task or clear your input.
Ctrl+C: Quit the application (press twice).
Ctrl+X / Meta+Enter: Open your input in an external editor.
Enter: Send your query to Gemini.

✨ Common Use Cases to Try Today

Here are a few practical ideas to get you started:

Quick Refactoring: Use @ to provide a file as context (e.g., @src/utils.py) and ask Gemini: "Refactor the calculate_total function in this file to be more readable."
Commit Message Generation: Run !git diff --staged and then ask Gemini: "Based on the diff, write a conventional commit message."
Command-Line Coach: Paste a complex shell command and ask: tar -xzvf archive.tar.gz "What do each of these flags do?"

╭───────────────────────────────────────────────────────────────────────────╮
│ >   tar -xzvf archive.tar.gz what does each flag do                       │
╰───────────────────────────────────────────────────────────────────────────╯

✦ Of course. The command tar -xzvf archive.tar.gz is used to decompress and extract files from a gzipped tar archive.

  Here is a breakdown of what each flag does:

   * -x: eXtract. This tells tar to get files out of the archive.
   * -z: gZip. This tells tar to decompress the archive using gzip. This is necessary for files ending in .gz.
   * -v: Verbose. This makes tar list each file as it is being extracted, so you can see the progress.
   * -f: File. This flag must be followed by the name of the archive file you want to process (in this case, archive.tar.gz).

  You can think of the command as saying: "eXtract the gZipped contents Verbosely from the File named archive.tar.gz."

🎉 Takeaway: You’re One Step Ahead!

Congratulations on making it through this guide! By familiarizing yourself with these commands and workflows, you’re already one step ahead of many developers who might not be tapping into the full potential of having an AI assistant directly in their terminal.

Thank you for investing the time to level up your skills. We encourage you to start integrating these commands into your daily routine. You’ll be surprised at how much time you can save and how much more you can accomplish.

Now we’d love to hear from you! What new command did you learn that you’re most excited to try? Share your thoughts in the comments below!

Master Your Workflow: Top Gemini CLI Commands You Should Know | Gemini CLI CheatSheet was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Python in Google Cloud: More Than Just Examples for Gemini

Maddula Sampath Kumar — Tue, 23 Sep 2025 14:04:27 GMT

Generated using Gemini 2.5

Is Python the ultimate language for Google Cloud development? Is it Google’s official recommendation, or do alternatives like Java, Go, or Rust hold sway in specific scenarios? These are crucial questions for any developer building on Google Cloud, particularly when diving into Generative AI with Gemini. This article offers an opinionated, yet fact-driven, perspective on Python’s prominent role, clarifying why you often encounter Python examples in Google Cloud documentation, especially for Gemini features.

Note: This is a repost from my Linkedin account. Also if you are interested in Gemini or Google Cloud? You can connect with me for more information or a quick chat! 😊

The Genesis of a Preference: 🐍 Pythons Rise in My Early Career

My journey into advanced programming and data analysis began long before “Data Science” became a household term. Before an internship at Standard Chartered Bank in 2012, my programming toolkit was dominated by C and Java, with Python being more of a personal project companion.

The internship presented a unique challenge: sifting through mountains of Unix system health and performance data to predict maintenance needs and identify system improvements. My initial inclination was to leverage Java, given my familiarity. However, my mentor, Asbjorn, encouraged me to explore Python. This pivotal decision unveiled Python’s unexpected power and laid the groundwork for its consistent presence in my projects.

Over the years, Python became the language of choice for almost all data science initiatives I led, driven by several compelling advantages:

Code Readability and Maintainability: Python’s clean, intuitive syntax drastically lowered the learning curve, making code easier to understand, write, and debug. This is a benefit Google itself recognizes and inspired Go language development in certain ways.
Efficiency for Lean Teams: Python’s conciseness allowed smaller development teams to achieve significant outcomes, often comparable to what larger teams using more complex or “low-code” solutions might accomplish. Its extensive ecosystem contributes to rapid development.
Prioritizing Availability Over Raw Performance: For many projects, the immediate availability of services, scripts, and tools proved more critical than bleeding-edge performance. With compute power becoming increasingly affordable, Python enabled rapid deployment and iteration. This is particularly true for scripting and prototyping.
“Batteries Included” for Data-Intensive Tasks: Python boasts a rich standard library and a vast ecosystem of third-party packages, particularly beneficial for data manipulation and analysis. This extensive library support is a consistent benefit for cloud development.

Pythons Enduring Reign in Data Science 📈 and Cloud Computing ☁

Python’s appeal in data science continued to grow for two additional reasons, deeply intertwined with cloud development:

Rapid Prototyping: The interactive interpreter and tools like Google Colab make Python invaluable for agile data analytics workflows. This capability is essential for quickly testing ideas and developing proofs of concept.
Robust Community and Specialized Libraries: Public libraries such as NumPy, Pandas, and Scikit-learn, despite often having performance-critical core logic written in lower-level languages like C or Fortran, offer user-friendly Python interfaces. Google taps into Python’s extensive library ecosystem, with machine learning and AI projects benefiting from libraries like TensorFlow.

Decoding Pythons Presence in Google Clouds Generative AI (Gemini)

This brings us to Python’s specific role within Google Cloud’s Generative AI, particularly with Gemini. You’ll find a significant number of Python examples for Gemini features, and this isn’t by accident.

The primary reason for the prevalence of Python samples is its readability and speed of prototyping. For quickly demonstrating capabilities, enabling developers to get started swiftly, and iterating on ideas, Python excels. The official Google GenAI SDK is indeed available for Python, alongside JavaScript/TypeScript, Go, and Java. The Python GenAI SDK provides an interface to integrate Google’s generative models into Python applications, supporting both the Gemini Developer API and Vertex AI APIs.

While Python is a widely-used, high-level programming language known for its simplicity and readability, it’s crucial to understand the nuance: these Python examples are provided for ease of use and quick experimentation, not as an exclusive recommendation for all production environments. Google Cloud itself supports a variety of popular languages, each suited for different development needs.

Lastly, about the absence of official GenAI SDKs for languages like Rust or C# is often a matter of demand and ecosystem focus, though Google Cloud generally supports a broad range of languages for various services. In fact, Google Cloud Service Extension uses a very interesting mix of GO, Java, Python and Web Assembly.

Beyond Python: A Polyglot Approach to Cloud Development

Google Cloud’s philosophy embraces a polyglot environment, recognizing that no single language fits every use case. While Python shines in data science, scripting, and rapid development, other languages are vital:

Go: Developed by Google, Go is a natural fit for GCP, known for performance, simplicity, and efficiency in cloud computing, distributed systems, and microservices.
Java: A robust, object-oriented language, Java is a powerhouse for enterprise applications, offering strong performance and security features, integrating seamlessly with GCP services.
Node.js (JavaScript): Excellent for web applications, real-time applications, and API services, Node.js allows JavaScript to run on the server side efficiently.

Google’s Gemini Code Assist even supports a wide array of programming languages for AI-assisted development, including Python, Java, JavaScript, C++, Go, and Rust, highlighting Google’s commitment to diverse developer ecosystems.

Choosing Your Cloud Language: A Strategic Decision

Ultimately, the “best” programming language for Google Cloud — or for any* cloud platform — depends on your specific use case, project requirements, performance needs, and team expertise. Python’s ease of use, extensive libraries, and strong community make it an excellent choice for (initial) development, data-intensive tasks, and rapid prototyping, especially within the Generative AI space. However, for highly performant, scalable, or enterprise-grade applications, Go, Java, or Node.js could be more suitable.

Google Cloud provides the tools and flexibility to choose the language that best empowers your team to build innovative solutions. So, while Python examples for Gemini are abundant and incredibly useful for getting started, remember to select the language that truly aligns with your project’s long-term goals.

Python in Google Cloud: More Than Just Examples for Gemini was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Beyond the Prompt: Why Your Next AI Agent Needs a Brain And How COALA (research paper) Provides an…

Maddula Sampath Kumar — Thu, 11 Sep 2025 05:54:45 GMT

Beyond the Prompt: Why Your Next AI Agent Needs a Brain And How COALA (research paper) Provides an interesting Blueprint

If you’re a developer, Tech lead, or Business Leads, very likely your life has been fundamentally altered by Large Language Models (LLMs) & . We’ve moved rapidly from simple text generation to building complex “agents” that can use tools, browse the web, and interact with the world.

But let’s be honest: right now, it’s the Wild West. We’re wiring together APIs and complex prompt chains, but we lack a standardized framework. How do we build agents that don’t just react , but actually learn , plan , and remember ?

A paper from researchers at Princeton, “ Cognitive Architectures for Language Agents ,” tackles this head-on. It argues that to build truly robust agents, we need to stop reinventing the wheel and look at the decades of research in cognitive science.

The paper introduces COALA (Cognitive Architectures for Language Agents), a framework that treats the LLM not as the entire agent, but as a central component-like a CPU-within a larger, structured system inspired by the human brain.

This post digests the COALA framework into its three essential pillars and explains why this matters for anyone building the next generation of AI.

The Problem with Monolithic Agents

Today, most “agents” work by stuffing the LLM context window with instructions, history, and tool outputs. This forces the LLM to manage planning, memory, and execution all at once. The COALA framework argues this approach is brittle and fundamentally limited.

To build sophisticated agents-ones that can learn from experience or manage complex, long-running tasks-we need to separate these functions. COALA provides a blueprint for this separation, organizing an agent along three key dimensions: Modular Memory, a Structured Action Space, and a formal Decision-Making Cycle.

Standard design for LLM or an LLM based Agent

Part 2: Pillar 1: Modular Memory (The Agents Mind)

Language models are stateless; they forget everything the moment the API call ends. True agents need memory. COALA proposes a structured memory system that mimics human cognition:

Working Memory: This is the agent’s “scratchpad” or RAM. It holds what the agent is actively thinking about: current perceptual inputs, active goals, and information retrieved from long-term memory. This persists between LLM calls.
Long-Term Memory: This is the agent’s hard drive, split into three types: Episodic, Semantic and Procedural.

Modular Memory structure (proposal)

Pillar 2 & 3: Actions and Decisions (How the Agent Thinks)

This is the most critical insight of the paper. In the COALA framework, an agent’s capabilities are split into two distinct types of actions:

External Actions (Grounding): These are the actions that affect the outside world-talking to a human, calling an API, or controlling a robot. This is what most developers already think of as “tools.”
Internal Actions: These are actions the agent takes upon itself . This is the agent thinking .

These internal actions are how the agent uses its memory. Instead of relying on one massive prompt, the agent uses specific internal actions:

Reasoning: The agent calls the LLM to process information already in its working memory to generate new insights that stay in working memory . This is literally the agent thinking to itself.
Retrieval: The agent reads from its long-term memory (episodic or semantic) and pulls the data into its working memory to use it for planning.
Learning: The agent writes to its long-term memory. This is the game-changer. An agent can use reasoning to reflect on a past failure (from episodic memory), generate a new insight (a semantic fact), and use the “learning” action to save that insight for all future tasks.

These actions come together in the Decision-Making Cycle . An agent doesn’t just run once; it operates in a continuous loop 🔁 . In each cycle, it uses internal actions (retrieval and reasoning) to plan -proposing several candidate actions and evaluating their potential outcomes.

Only after this internal deliberation does it select the best action to execute (which could be an external action, or even an internal learning action).

A simplified workflow of proposed LLMAgent design

The Takeaway: Why This Matters from Dev to Business Leads

The COALA framework is an interesting academic theory; it is also practical blueprint for building the next generation of AI.

For the Developer: This model explains why simple systems like ReAct (which synergizes reasoning and acting ) work so well, and provides the architecture for building far more advanced agents. It gives you a vocabulary to build agents that don’t just run, but actively learn (like Voyager, which writes new code to its own procedural memory ).
For the Tech Lead: Stop building monolithic prompt chains. COALA provides the separation of concerns needed to build modular, testable, and scalable agents. You can upgrade the agent’s memory, add new procedural skills, or even swap the underlying LLM without rewriting the entire system.
For the Business Leads: This is the strategic path from simple tools to truly autonomous systems . Agents that can reflect on their own performance and permanently update their semantic or procedural knowledge are the goal. This architecture provides the structure to build robust, predictable agents that can safely learn and adapt over time.

Originally published at https://www.linkedin.com.

Beyond the Prompt: Why Your Next AI Agent Needs a Brain And How COALA (research paper) Provides an… was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Gemini 2.5 Flash vs. Gemini 2.0 Flash: A New Era for AI Image Generation

Maddula Sampath Kumar — Fri, 05 Sep 2025 01:06:42 GMT

Generated & Edited using Gemini 2.5 Flash Image (preview)

Google has once again pushed the boundaries of generative AI with the release of Gemini 2.5 Flash Image (Preview), a significant upgrade to its predecessor, Gemini 2.0 Flash Image. While both models offer impressive image generation capabilities, Gemini 2.5 Flash introduces a suite of new features and enhancements that provide you with unprecedented creative control and higher-quality outputs. In this blog post, we will delve into the key differences between these two models, highlighting the advancements that make Gemini 2.5 Flash a game-changer for creators and developers.

(If you prefer, you can read this post in Linkedin | Dev.to | Github)

Key Advancements in Gemini 2.5 Flash

1. Conversational Editing and Iterative Refinement: One of the most significant improvements in Gemini 2.5 Flash is its ability to engage in conversational editing. This means you can now refine and edit images through a natural language dialogue. Instead of starting from scratch with a new prompt for every minor adjustment, you can provide iterative feedback. For instance, you can ask the model to “make the sky a deeper blue” or “add a dog in the background,” and it will make the changes while preserving the rest of the image. This feature was more limited in Gemini 2.0, making the creative process in 2.5 much more fluid and intuitive.

2. Multi-Image Fusion and Composition: Gemini 2.5 Flash introduces multi-image fusion, a powerful feature that allows you to combine up to three images to create a single, seamless visual. This opens up a world of creative possibilities, from placing a product in a new environment to creating surrealist art. While Gemini 2.0 had some capabilities in this area, 2.5 takes it a step further with more sophisticated blending and composition, resulting in more photorealistic and coherent images.

3. Character and Style Consistency: Maintaining a consistent character or style across multiple images has always been a challenge for AI image generators. Gemini 2.5 Flash addresses this with enhanced character and style consistency. You can now generate images of the same character in different scenes, outfits, or poses, and the model will maintain their key features and appearance. This is a massive leap forward for storytelling, branding, and any application that requires a consistent visual identity.

4. Improved Visual Quality(Aspect ratio) and Text Rendering: Gemini 2.5 Flash boasts a noticeable improvement in overall visual quality. The images generated are more detailed, vibrant, and photorealistic than those from Gemini 2.0. Furthermore, the model’s ability to render text within images has been significantly enhanced. This means you can create posters, invitations, and other designs with clear, legible text, a task that was often a struggle for previous models.

My final image, just after 4 edits using Gemini 2.5 Flash Image (preview)

Here is the link to download above image.

The Verdict: A Leap Forward in Creative Control

While Gemini 2.0 Flash was a powerful tool in its own right, Gemini 2.5 Flash represents a significant evolution in AI image generation. The introduction of conversational editing, multi-image fusion, and enhanced consistency gives creators an unparalleled level of control over the final output. These advancements, combined with the improvements in visual quality and text rendering, make Gemini 2.5 Flash an indispensable tool for anyone looking to push the boundaries of their creativity. As AI-powered image generation continues to evolve, Gemini 2.5 Flash has set a new standard for what’s possible.

Resources:

Reposted from https://msampathkumar.github.io/blog/2025/09/04/gemini-25-flash-vs-gemini-20-flash-a-new-era-for-ai-image-generation/

Gemini 2.5 Flash vs. Gemini 2.0 Flash: A New Era for AI Image Generation was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Cookbook Lesson 05: Review — Five Takeaways to enhance your Gemini Apps

Maddula Sampath Kumar — Tue, 19 Aug 2025 04:47:47 GMT

Generated with Gemini 2.5 Flash

Cookbook Lesson 05: Review — Five Takeaways to enhance your Gemini Apps

Welcome to Lesson 05 of my Gemini Cookbook series ︎

The potential of Gemini is limitless, but unlocking its true power requires more than just a ‘hello world’ app. It demands a strategic approach to performance, cost, and design. In this lesson 05, we’ll how to transform that potential into reality by exploring 5️⃣ key takeaways that will elevate your Gemini applications to the next level.

In Lessons 01 through 04, we covered essential topics, from building a “hello world” app to creating powerful, context-aware chatbots. In this Lesson 05, we’ll do a quick recap of those earlier lessons and then dive into some important tips and takeaways for building great Gemini features applicable to most users.

This lesson is the last part of 5 part series — Google Cloud — Gemini Cookbook(GitHub link).

Takeaway 01: Multi-Layered Application

Let me quote from a Google Developer Advocate.

“To effectively monitor your agent, it’s best to adopt a practical, layered approach. Here’s a guide to setting up observability at every stage of the development lifecycle.” — Alvin Prayuda Juniarta Dwiyantoro

Building an application and its features can become a cascading challenge, especially when you aim for an app that can be used, maintained, updated, and deployed to production environments. As mentioned in the quote, building applications with a layered approach can help streamline the overall development flow.

Flow chart

Lesson 04 Github-Link

Gemini App design from Lesson 04

This tutorial series built a sophisticated chatbot over four lessons.

Lesson 01: We started with streamlit_app.py to create a basic “hello world” page and learned deployment fundamentals.
Lesson 02: We developed llm.py, the chatbot’s “brain,” using Google’s GenAI SDK, and designed the user interface for interactive chat sessions.
Lesson 03: We added contextual awareness using System Instructions and a Context Cache to maintain coherent conversations.
Lesson 04: We expanded the chatbot’s context to external resources like Google Search and external data stores via Retrieval Augmented Generation (RAG).

To maintain simplicity, Lessons 03 and 04 utilize a layered architecture (for code), adding new features as distinct modules. This actually differs with the initial approach, where a single file facilitated early development. As features expanded, I realized the benefits of dedicated modules like RAG.py and Cache.py over modifying existing files, which significantly streamlined development, testing, and debugging.

This layered approach proved effective in our application. The following sections detail further takeaways from my experiments and explorations.

Takeaway 02: Improve your speed

Large Language Models (LLMs) are extensive software applications that utilize significant CPU, memory, and network resources. Much like services such as BigQuery or Cloud SQL, these models are globally hosted, serving users worldwide. In this section, we will observe how you can improve model responses speed.

021 Global Endpoints (GE):

This section presents my personal explorations and hypotheses.

It’s common to assume that proximity to the LLM’s physical location ensures optimal performance. While reduced network latency is a factor, it doesn’t account for the entire picture. LLMs operate as shared services, inherently utilizing a first-in, first-out queuing system. Therefore, the ideal location selection hinges on the equation: min_response(Nearby locations + shorter queue). When incorporating additional variables like time and potential datacenter disruptions (e.g., maintenance, power failures, or even submarine cable issues), the optimal solution evolves to min_response(Nearby *available* locations + shorter queue).

However, forecasting all these conditions is challenging. Continuously querying every Gemini model globally to assess response times could paradoxically impede overall performance. This is precisely where Global Endpoints demonstrate their value! They intelligently determine which LLM model will deliver the quickest response, thereby accelerating your interactions.

Back in March 2024, I whipped up a demo app that hit a global endpoint instead of a specific regional one, like us-central1 or europe-central2. Since I was working from my place in Warsaw, Poland, with an old router, I figured there might be some lag. But honestly, the huge performance difference between the global endpoint and my go-to europe-west1 was pretty wild — in a good way!

Global Endpoint vs Other Locations

Based on my personal explorations and practical experience, employing Global Endpoints(GE) can significantly enhance the speed of your Gemini application. While GE are an excellent choice, they do have some limitations.

Data Localization: If your app needs to keep data in a specific place (like for GDPR), GE might not be the right choice.
LLM’s Features Limitation: Certain features, such as RAG (Vector Search) built in specific locations (e.g., Europe or US-Central), may not be available for use.

022 Using Gemini-Lite Models

Upon the initial announcement of the Gemini Live API, I was quite enthusiastic, particularly regarding the integration of WebSockets, which I find to be remarkably efficient compared to standard HTTPS requests. Following exploration, I determined that a text-based chat proved to be the optimal choice for my needs, rather than audio or video conferencing.

Although the Live API offers robust transcription options for both input and output (https://cloud.google.com/vertex-ai/generative-ai/docs/live-api#transcribe_audio), I found text-based interaction more effective(simple and fast enough) as it allowed me to review my queries before submission.

These two considerations led me to consistently utilize the gemini-flash-lite models for straightforward chatbot applications.

Recommendation: Use lite models when you get an opportunity.

Takeaway 03: Cost Saving

031 Use Context Caching

When leveraging Gemini for token usage tracking through observability, you might encounter a pleasant surprise: Gemini’s Implicit Context Caching (ICC) and Explicit Context Caching (ECC) features. By default, ICC is active but can be disabled if necessary.

As with any caching mechanism, the effectiveness is determined by the cache-hit to cache-miss ratio. With ICC, cost savings are not guaranteed. For instance, if you process 100–200 distinct queries across various languages, a common token cache is not feasible, and Gemini cannot provide cost savings in such scenarios.

In contrast to ICC, ECC offers a guaranteed cost reduction. While ECC is a powerful feature, its implementation is a design choice. Achieving cost-effectiveness with ECC, akin to reaching escape velocity, requires hitting a specific mathematical threshold. It’s not magic, just pure mathematics!

My recommendation is to consider using ECC when you have about 200+ queries and need a large context (dataset) in a short duration. For example, (1) you are building an Astro-Science Chatbot knowledgeable in a dozen physics, chemistry, and astrophysics books (2) You have to deal with really smart kids (I hope AI could handle those queries).

Here is a quick recap of takeaway 031.

032 Provision Throughput (PT)

This is Vertex AI only feature.

Provisioned Throughput is a fixed-cost, fixed-term subscription available in several term-lengths that reserves throughput for supported generative AI models on Vertex AI. To reserve your throughput, you must specify the model and available locations in which the model runs.

It is important to note that exceeding your Provisioned Throughput limit will not halt your requests. While this ensures uninterrupted service by preventing errors, it also means you cannot cap your Gemini costs if budget constraints are a concern.

Takeaway 04: Context Awareness

Whether it’s Gemini or some other LLMs, it’s always the same rule. People are calling context the king but when I was learning data science, I heard a similar quote about Data / Information.

Let me take a personal example here. Back in 2015, my colleagues had a simple classification challenge but the difficult part was the data distribution. Client has provided 1 TB of data positive data and 10 MB of negative data. (1,000,000 MB of positive data set and 10 MB is -tive dataset). The substantial disparity in data distribution made all our initial models predominantly classified queries as positive, achieving approximately 99.999% accuracy. However it was not the solution we needed.

When I say context, it is the data relevancy. If your queries are not relevant to the context or theme you have set for the model, then you may be doing something fundamentally wrong. Imagine studying poetry to improve painting.

So to keep your model responses relative to what users need, with gemini I have tested the following approaches

041 Use System Instructions:

Define all the critical details that your model needs to remember. For example, “You are Bill Nye, the science guy. You are an American science communicator, television presenter, and former mechanical engineer. You are best known as the host of the science education television show Bill Nye the Science Guy and as a science educator in pop culture.” (Copied from wikipedia)

042 Use Grounding with Google Search:

To the scope limited, many times LLM models are not connected to search engines like Google (or DuckDuckGo or Yahoo). By enabling this, you can allow Gemini Model to browse the internet and get the latest information like date, climate and so on.

043 Use RAG:

Similar to Grounding with Google Search, you may find your in a case where you want the model to automatically learn information about a certain project or certain dataset that is private to you or your team or organization. In such cases, using RAG has turned out to be an amazing investment.

Using Vertex AI’s RAG feature, all the provided data is converted into vectors and stored in a Vector database. When user queries, relevant data is identified from this Vector database and included as part of the overall context for the Model. As you may expect now this allows the model to get relevant answers.

Here is a quick recap of takeaway 04.

My suggested path to improve Gemini Apps data relevancy

Takeaway 05: Simplicity

Simplicity stands as a crucial principle. Throughout my development and refinement of applications, I’ve experimented with various LLM assisting tools. The Gemini CLI consistently proved most effective. Yet, even with this preference, it occasionally exhibited hallucinations, while other selections yielded even less favorable results. One time as an experiment, I allowed these changes to persist, meticulously reviewing each. This process finally led to a visually very appealing application with substantial code. During testing, as errors emerged, the model would rectify them. After several iterations, I realized I no longer genuinely enjoyed the application I had built. While aesthetically pleasing, I lacked the confidence to assume ownership for future feature additions or bug fixes. Gemini handled much of the work, but my engineering accountability kept me on edge.

Simple designs are easy to build, ship, debug & fix.

Consider this: if a bug exists within 10 lines of code, the probability of quickly identifying it is 10%; however, in 500 lines of code, that probability drops to 0.2% (The layered approach in modules 03 and 04 simplified my code, resulting in faster debugging and error correction.)

Simple designs are easier to understand, build, and maintain. As mentioned in Takeaway 01, a layered approach is helpful, but always prioritize simplicity in its execution. Ultimately, ensure your code is readable and simple for humans.

Conclusion:

When building Gemini applications, a clear, layered, and straightforward design leads to robust, debuggable, and maintainable solutions, enabling powerful, efficient, and cost-effective Gemini-powered experiences.

As this series concludes, remember that simplicity is key to great Gemini apps. These five takeaways are your secret ingredients for success. The next adventure awaits with the Google Agent Development Kit (ADK). Our next post will explore the Google Agent Development Kit (ADK), built on the Google Gen AI SDK, to enhance your Gemini development journey. Lets go beyond the basics and build something outstanding in Gen AI.

Now something for fun☺︎

XKCD: Computers vs Humans

Cookbook Lesson 05: Review — Five Takeaways to enhance your Gemini Apps was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.

Cookbook Lesson 04: Unlock Enterprise AI: Grounding Gemini with RAG and Google Cloud Search

Maddula Sampath Kumar — Sat, 02 Aug 2025 04:53:00 GMT

https://medium.com/media/df10473a783fc3f753d267b17068a90b/href

Cookbook Lesson 04: 🔓 Unlock Enterprise AI: Grounding Gemini with RAG and Google Cloud Search 🚀

Welcome back! 👋 In Lesson 03, we built fast, relevant chatbots using direct context methods like In-Context Learning ✨, System Instructions 📜, and Context Caching 📦. Those are fantastic for quick demos and consistent persona. But what if your chatbot needs to know the latest company sales figures, details from an obscure internal report, or specific news updates? That’s where Large Language Models (LLMs) hit their limit — they hallucinate! 😵‍💫

Today, we tackle the next frontier: Grounding your Gemini models in real-time, external, and even proprietary data using Retrieval Augmented Generation (RAG), with Google Cloud Search as your powerful engine. 🔍

This lesson is part of the 5 part series Google Cloud — Gemini Cookbook (GitHub Link).

1. Grounding LLMs: Combating Hallucinations with External Knowledge 🛡️

LLMs are brilliant pattern matchers, trained on vast amounts of internet data. However, their knowledge is frozen at their last training cut-off date. They don’t have real-time access to the internet, nor do they inherently know your company’s internal documents, recent sales data, or specific client interactions. When asked about such information, they might:

Refuse to answer: “I don’t have information on that.” 🤷‍♀️
Generate generic or outdated information: “According to general industry trends…” 🗓️
“Hallucinate”: Make up plausible-sounding but entirely false information. This is the most dangerous! 🚨

Example: A Chatbot Hallucinating on Project Status

Let’s revisit our chatbot. If you ask about specific, latest information:

User: What is the status of our CRM project - Alpha?

Model: I'm sorry, but I don't have access to specific project details like the status of your CRM project Alpha. 😟 I can only provide general information.

User: What is today's date ?

Model: Today's date is June 16, 2024. 📅

This is a problem. In enterprise settings, accuracy is paramount. We need a way to connect Gemini to our actual, verifiable knowledge. This is called grounding. 🌱

2. Introduction to Retrieval Augmented Generation (RAG): Your LLM’s Research Assistant 🧑‍🎓

Retrieval Augmented Generation (RAG) is an architectural pattern that solves the hallucination problem by giving LLMs access to external, up-to-date, and domain-specific information.

Think of RAG as giving your LLM a brilliant research assistant: 🧐

Retrieve: When you ask a question, the “research assistant” first searches a vast library (your knowledge base) for relevant documents or snippets. 📚
Augment: It then takes the most relevant findings and gives them to the LLM. 🧩
Generate: Finally, the LLM uses this specific, retrieved information (along with your original query) to formulate an accurate and grounded response. ✅

Key Components of a RAG System:

Knowledge Base: Your source of truth — internal documents, databases, websites, etc. 📁
Retriever: A system (like a vector database combined with an embedding model, or a powerful search engine like Google Cloud Search) that can quickly find the most relevant pieces of information from your knowledge base based on a query. 🔎
Generator: The LLM (Gemini) that synthesizes the answer using the retrieved context. 🧠

Benefits of RAG:

Factuality: Grounding responses in real data dramatically reduces hallucinations. ✅
Currency: LLMs can answer questions about information that wasn’t in their training data or is constantly changing. ⏰
Domain-Specific Knowledge: Access to proprietary or niche topics. 💼
Attribution: Potential to show users where the information came from (e.g., “Source: Policy Manual v2.1”). 🔗

RAG vs. Context Cache: A Crucial Distinction 🚨

It’s vital to differentiate RAG from Context Caching (Lesson 03).

Context Cache: Reuses small, static pieces of pre-loaded or conversational context. It’s about efficiency for fixed data, avoiding redundant token usage. Think of it as a persistent “sticky note” or short-term memory for repeated instructions or small data blocks. 📝
RAG: Dynamically retrieves specific, often large, and always relevant chunks of information from a vast, external knowledge base on demand for each query. It’s about expanding the LLM’s factual knowledge with new, current, or private data. 🌐

3. Application Architecture

This application is designed to be a flexible and extensible chatbot that can leverage different grounding techniques. Here’s a breakdown of the core components:

Core Application Logic:

streamlit_app.py (UI): This is the user-facing component of the application, built with Streamlit. It provides the chat interface, handles user input, and displays the LLM's responses. It's the "skin" of our application.
llm.py (The Brain): This module is the central nervous system of our chatbot. It's responsible for all interactions with the Gemini API. It takes the user's prompt, and based on the selected mode (Default, Context Cache, or RAG), it constructs the appropriate request to the Gemini model.
cache.py (Optional Battery): This module manages the Context Cache. When the "Use Context Cache" option is selected, llm.py uses this module to create and manage a cache of context, which can be reused across conversations to improve speed and reduce costs.
rag.py (Optional Battery): This module handles the Retrieval-Augmented Generation (RAG) functionality. When the "Use RAG as Tool" option is selected, llm.py uses this module to create and manage a RAG corpus. This allows the LLM to retrieve information from a knowledge base to answer questions.

Code Links:

4. RAG Implementation Flow

Here’s a more detailed look at how the RAG process works within our application when the “Use RAG as Tool” option is enabled:

Conclusion

This lesson demonstrated how to ground Gemini models with external knowledge like using Google Search or RAG. By leveraging RAG, we can build more accurate, factual, and useful AI applications that can reason about private and real-time data.

Update: Lesson 05 is at https://medium.com/google-cloud/cookbook-lesson-05-review-five-takeaways-to-enhance-your-gemini-apps-aab317c081dd

Cookbook Lesson 04: 🔓 Unlock Enterprise AI: Grounding Gemini with RAG and Google Cloud Search 🚀 was originally published in Google Cloud - Community on Medium, where people are continuing the conversation by highlighting and responding to this story.