Vertex AI Agent Engine
How to build AI Agents with Vertex AI Agent Engine (Reasoning Engine)
Agents are autonomous applications that use the reasoning capabilities of LLMs.
With Google Cloud, we get this as a service by using the Vertex AI Agent Engine (previously known as Vertex AI Reasoning Engine). This allows us to build agents easier than ever. You can create agents integrating Google’s Gemini models, Python functions, and tools like LangChain for orchestrated reasoning. And all that is serverless, fully managed, and in a scalable environment.
Understanding Agents
An agent in Vertex AI Agent Engine consists of primary components.
- Model
The core language model (Gemini, in our case) interprets queries and generates responses. - Tools
Custom Python functions that extend the agent’s capabilities. - Orchestration
The orchestration layer guides the agent with reasoning, combining model outputs and tools. It manages multi-step workflows and helps agents decide when to call tools for more accurate responses. - Memory
Store the history allows for multi-turn conversations and helps with the agent's reasoning. - Deployment
The part makes the agent accessible as a cloud service using Vertex AI. Deployment handles scalability and infrastructure, allowing seamless integration and interaction via APIs without backend management.
Tools for Agent Interactions
Tools allow agents to perform actions.
These tools can range from custom Python functions you define to pre-built Google tools, such as Retrieval Augmented Generation (RAG), for fetching relevant information.
This enables our agents to interact with the outside world.
Custom Python Tools
A tool can be as simple as a Python function that provides real-time data. Here’s an example of a custom tool that retrieves currency exchange rates.
Remember that this could be anything you can pack into a Python function. Finally, this allows agents to interact with the outside world, making agents so powerful.
def get_exchange_rate(currency_from: str = "USD",
currency_to: str = "EUR",
currency_date: str = "latest"):
"""Retrieves the exchange rate between two currencies on a specified date."""
import requests
response = requests.get(f"https://api.frankfurter.app/{currency_date}",
params={"from": currency_from, "to": currency_to})
return response.json()
def summarize_webpage(url: str):
...
def return_order(order_id: int):
...
def upload_to_cloudstorage(document: string):
...
Agents work internally with function calling. To learn more about it, watch this video.
Pre-built Tools in Vertex AI
Vertex AI also offers built-in tools, like the RAG tool, which can pull relevant information from a defined corpus. For example, using Tool.from_retrieval
, we can set up a retrieval tool for fetching high-similarity document chunks from a document corpus:
from vertexai.preview.generative_models import Tool
rag_retrieval_tool = Tool.from_retrieval(
retrieval=rag.Retrieval(
source=rag.VertexRagStore(
rag_resources=[
rag.RagResource(rag_corpus="projects/234439745674/locations/us-central1/ragCorpora/2842897264777625600")
],
similarity_top_k=3,
vector_distance_threshold=0.5,
),
)
)
If you want to learn more about Google's RAG API, which we are using in this Agent example, check out this video.
Build the Agent
Everyone has preferences on which agent framework to use and which one is the best.
Agent Engine provides several commonly used Frameworks out of the box:
- Agent Development Kit
- LangChain
- LangGraph
- AG2
- LlamaIndex
Agent Engines also allows custom templates, so if there is no pre-built template for your Agent Framework of choice, no worries.
The agent combines the model’s language understanding with specific tools. We are doing that by providing the agent with a list of all our tools/functions.
from vertexai.preview import reasoning_engines
agent = reasoning_engines.LangchainAgent(
model="gemini-2.0-flash-exp",
# both custom and pre-built tools
tools=[return_order, rag_retrieval_tool],
)
With multiple tools defined, the agent is set up to integrate the tools and the Gemini model within the reasoning layer. This setup lets the agent choose when to leverage each tool based on the query type. It also allows for multi-step tasks that involve multiple tools and multiple iterations.
Deploy the Agent
Deploying the agent in a managed environment like Vertex AI brings multiple advantages, especially for production workflows where scalability, security, and ease of management are essential. You do no need to think about the infrastructure to run your agent.
remote_agent = reasoning_engines.ReasoningEngine.create(
python agent,
requirements=["google-cloud-aiplatform[langchain,reasoningengine]", "langchain-google-firestore" ],
)
Deploying an agent with Vertex AI Agent Engine takes approximately 3 minutes. After the deployment, Google will provide us with a Reasoning Engine endpoint that is ready for production.
projects/234439745674/locations/us-central1/reasoningEngines/4974032274384224256
Using the Agent
We can use the agent locally, which is helpful for development, or the remote agent created during deployment.
Local Agent
To use a local Agent, there is no need to deploy it. This makes iterating on your prompt easy, and you can quickly test changes.
response = agent.query(
input="I want to return my order.",
config={"configurable": {"session_id": "1010"}})
Remote Agent
After creating the agent, we can use it in production with a few lines of code by using the reasoning Engine ID we got while creating it. From there, we can send queries to the agenda endpoint, which automatically scales with your request.
LOCATION = "us-central1"
REASONING_ENGINE_ID = "4974032274384224256"
remote_agent = reasoning_engines.ReasoningEngine(
f"projects/{PROJECT_ID}/locations/{LOCATION}/reasoningEngines/{REASONING_ENGINE_ID}")
response = remote_agent.query(
input="I want to return my order.",
config={"configurable": {"session_id": "1010"}})
Integrating Memory for Contextual Responses
Memory enables an agent to retain information across multiple interactions, effectively handling multi-turn conversations or tasks that require contextual understanding over time.
Integrating memory allows the agent to:
- Handle multi-turn conversations: Maintain context for more natural and engaging interactions.
- Recall prior information: Refer back to previous answers, reducing redundancy.
- Personalize responses: Tailor answers based on the history of the ongoing conversation.
To add memory, you’ll define a function that loads and stores message history.
This history can be managed with Google’s Firestore, Bigtable, or Spanner, or you can use a custom storage solution.
def get_session_history(session_id: str):
from langchain_google_firestore import FirestoreChatMessageHistory
from google.cloud import firestore
client = firestore.Client(project="sascha-playground-doit")
return FirestoreChatMessageHistory(
client=client,
session_id=session_id,
collection="history",
encode_message=False,
)
session_id
identifies each conversation, allowing for multiple sessions to be managed separately.FirestoreChatMessageHistory
saves and retrieves messages from a Firestore collection calledchat_memory
.
With the memory setup defined, you can now pass it into the agent configuration. Here’s how to set up the agent to use memory with the Gemini model and existing tools:
from vertexai.preview import reasoning_engines
agent = reasoning_engines.LangchainAgent(
model="gemini-1.5-pro",
chat_history=get_session_history,
tools=[return_order, rag_retrieval_tool],
)
To maintain context across interactions, the agent needs to reference a unique session_id
each time a query is made. Here’s how to make queries within a session.
# Start a conversation
response = agent.query(
input="What is the setup guide for the Nimbus Weather Station?",
config={"configurable": {"session_id": "1010"}}
)
print(response)
# Follow-up question in the same session
response = agent.query(
input="What other products can connect to it?",
config={"configurable": {"session_id": "1010"}}
)
print(response)
In our case, we store the history in Firestore. Each session will contain a dedicated document containing all the messages.
I planned to use Firestores' TTL feature to remove all messages older than X days. However, FirestoreChatMessageHistory does not store a timestamp for the messages. I have an open GitHub request. If I find time, I will create a pull request for that. If you want to work on that together, let me know.
Orchestration Flexibility
We used Langchain as our agent Orchestration. But you are not at all limited to Langchain. Google has an excellent example of how flexible Agent Engine (Reasoning Engine) can be.
This flexibility allows for easy use of other frameworks as long as you follow the syntax: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/reasoning-engine#example_syntax
Tracing
Agents don’t always get things right on the first try. They often reason through multiple steps and tool invocations before producing an answer. This is where Tracing and Logging become invaluable.
Logging is enabled by default with the Vertex AI Agent Engine. However, allow tracing if you want more profound insights into your agent's thinking.
Tracing provides a timeline of events called spans that show you:
- Which tools were used?
- How long did each operation take?
- The exact input and output of each step, including the number of tokens
To use it, add enable_tracing=True
as Agent parameter.
agent = agent_engines.LangchainAgent(
...
enable_tracing=True
)
Here’s what that looks like in practice:
Costs and how does it compare to Vertex AI Agent Builder
The pricing is much like Google Cloud run pricing and is based on VCPU and GiB (memory) hours used during request processing. Agent Engine integrates into Google Monitoring, allowing you to understand your usage.
What if the Agent deployment failed?
Check the logs. It usually has enough information to debug, like when I used Firestore to store the chat history, but I missed the permissions. It is always worth reading the documentation first, but code is much more fun. Anyway, the agent log clearly indicated what I had missed.
DEFAULT 2024-11-06T10:45:53.757222Z return await dependant.call(**values)
....
DEFAULT 2024-11-06T10:45:53.757225Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
DEFAULT 2024-11-06T10:45:53.757384Z google.api_core.exceptions.PermissionDenied: 403 Missing or insufficient permissions.
{
"textPayload": "google.api_core.exceptions.PermissionDenied: 403 Missing or insufficient permissions.",
"insertId": "672b48e1000b8e88fd3da43c",
"resource": {
"type": "aiplatform.googleapis.com/ReasoningEngine",
"labels": {
"resource_container": "234439745674",
"reasoning_engine_id": "3495725696699858944",
"location": "us-central1"
}
},
"timestamp": "2024-11-06T10:45:53.757384Z",
"logName": "projects/sascha-playground-doit/logs/aiplatform.googleapis.com%2Freasoning_engine_stderr",
"receiveTimestamp": "2024-11-06T10:45:54.094422130Z"
}
Debugging Pickling Issues in Google Cloud’s Reasoning Engine
When deploying a Reasoning Engine with reasoning_engines.ReasoningEngine.create()
, you might run into similar error like the following :
TypeError: cannot pickle '_thread.RLock' object
This happens because Google Cloud’s Agent Engine serializes your agent before deployment, and some objects inside it cannot be pickled. The serialization process will fail if your Agent Engine includes functions that rely on threading locks, file handles, or other non-serializable objects.
Step 1: Identifying the Root Cause
Since the error message does not specify which object is unpicklable, debugging can be frustrating. To systematically isolate the issue, test each function in the agent by attempting to pickle it:
import cloudpickle
for tool in agent._tools:
try:
cloudpickle.dumps(tool)
print(f"✅ {tool.__name__} is serializable")
except Exception as e:
print(f"❌ {tool.__name__} is not serializable: {e}")
This will print out which function is causing the pickling issue. It will not tell you why, but at least you know where to look. In my case, I was using requests and imported it outside of the function.
Step 2: Fixing Pickling Issues
Solution 1: Move Imports Inside the Function (Quick Fix)
If the function uses requests
for example, move the import inside the function so it is only loaded at runtime:
def fetch_data(url):
import requests # ✅ Moved inside function
response = requests.get(url)
return response.json()
This prevents the global state of requests
from interfering with pickling. But that is only a quick fix. I recommend solution 2.
Solution 2: Use a Class to Encapsulate Tools (Best Practice)
Instead of defining standalone functions, wrap them in a class and only pass instance methods to the agent:
import requests
class MyTools:
def __init__(self):
self.token = "your_api_token"
def fetch_data(self, url):
headers = {"Authorization": f"Bearer {self.token}"}
response = requests.get(url, headers=headers)
return response.json()
tools = MyTools()
agent = reasoning_engines.LangchainAgent(
model="gemini-2.0-pro-exp-02-05",
tools=[tools.fetch_data]
)
This ensures that pickling only happens on the method references, not the global module state. Clean and modular code.
Limitation (as of January 2025)
- Reasoning Engine (Langchain on Vertex AI) was renamed to Vertex AI Agent Engine (March 2025)
- A limited number of the regions are available in the US, Europe, and Asia.
- There have yet to be officially documented quotas.
- Python support up to 3.11
Why Use Vertex AI’s Reasoning Engine for Agents?
The Reasoning Engine provides a flexible, managed framework that lets you balance AI-driven reasoning with custom code. It enables you to:
- Define tools in Python that the language model can call to extend functionality.
- Use Google’s Gemini models for advanced language understanding across multimodal tasks.
- Customize reasoning steps for complex workflows, tailoring the agent’s responses based on your needs.
- Focus on building logic instead of managing infrastructure, avoiding the need for self-hosting.
The key benefits of using the Reasoning Engine include:
- Modular flexibility: Build and customize agents to fit your development needs, whether through open-source frameworks or your code.
- Streamlined development: Reduce boilerplate code, integrating Python functions directly without excessive setup.
- Quick prototyping: Accelerate your development process with templates that enable rapid testing and iteration, swiftly moving from concept to production.
- Enterprise-grade deployment: Benefit from a managed, scalable environment in Vertex AI, which handles dependencies and operations, allowing you to focus on your agent’s functionality.
Feature wishlist
- Allow the usage of models hosted as Vertex AI Endpoints
- Allow the usage of Claude on-demand Vertex AI models
- Allow passing billing labels to track costs better, like we can do with Gemini.
https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/add-labels-to-api-calls - Allow for resource monitoring to understand costs better
(released at Google Next 2025) - Allow for passing runtime parameters natively supported by LangChain but not with Vertex AI Agent Engine.
https://python.langchain.com/docs/how_to/tool_runtime
(Agent Development Kit supports this)
Want to learn more?
Google has an excellent repo with many examples. Check it out.
Thanks for reading and listening
I appreciate your feedback and questions. You can find me on LinkedIn. Even better, subscribe to my YouTube channel ❤️.