Evolving Cloud Operations Self-Service with AI Agents
In my previous article Adding self-service capabilities to your Landing Zone, I evaluated how Application Integration supports growing your Google Cloud Landing Zone increasing efficiency through self-service. Now, I will explore the exciting potential of integrating a generative AI agent to streamline these operations. This question is raised by many organizations I talk to; but how can you get started quickly while preparing to scale at the same time? The architecture I describe enables exactly this, going beyond a quick (and often ad-hoc) (Generative)AI proof of concept.
Disclaimer: I work at Google in the cloud team. Opinions are my own and not the views of my current employer.
Requesting a new project
We will continue with the use case for requesting a new project. Previously a static intake form was used that integrates with Application Integration. We illustrated the potential of Generative AI through the Vertex AI Integration Connector allowing to parse natural language requests, automate the workflow, and generate personalized confirmations. Where this initial approach validated our hypothesis that an AI agent can significantly enhance self-service capabilities, this blog expands on this concept.
User story: as an IT Team Lead, I want to request a new GCP project using an AI agent, so that my team can start their development work
User journey:
- The user navigates to an AI agent and starts a conversation
- The AI agent gathers the necessary information from the user, retrieves information from backend systems and triggers an automated workflow to register the request and take any next action that you defined in this process.
- The request data is stored in the foundation’s data store so that it can be used for insights and other automation services.
- The user can request status updates to the AI agent about the request status.
The below picture shows how we will implement a Conversational Agent for interacting with the user and Application Integration for fulfilling the request once all necessary information has been gathered.
In order to capture the request details, we will use a Conversational Agent (Dialogflow CX) leveraging different playbooks for the New Project and Request Status flows. There are several options ranging from deterministic to fully generative AI agents.
- Create a new Dialogflow CX agent with 3 playbooks: 1 playbook that will route the request depending on the conversation: create a new project or get the request status
- Greet the user
- if the user wants to create a new project, go to ${PLAYBOOK:New project}.
- if the user wants to know the status of his request, go to ${PLAYBOOK:Request status}.
- When you return to this playbook, restart the conversation asking how you can help with anything else?
The next step is to create the playbook to facilitate the new project request.
- Create a new playbook ‘New Project’ and define the instructions to follow.
- Similarly to the previous blog, we will capture a couple of fields for illustration purposes only: Requestor, Technical Owner, Business Owner and Project Name.
- Gather necessary information to create a new project
- Step 1: Collect the technical owner for the project to be created.
- Step 2: Collect the business owner for the project to be created. The business owner is also the project requester.
- Step 3: Collect the name of the project to be created.
- Step 4: Get the request approver using ${TOOL:request-approver} and inform the user who get notified for approval
- Step 5: Always confirm the project name, technical and business owner and if correct submit the creation request for processing using ${TOOL:request-submit}
- Step 6: End the conversation
- Illustrating a lookup from a backend system, the AI agent will fetch the request approver based on the business owner input. To achieve this, we have configured a tool called ‘request-approver’ that will call a Cloud Function. A sample of the endpoint’s API specification is provided below.
openapi: 3.0.0
info:
title: Get a request approver
version: 1.0.0
servers:
- url: ''
paths:
/request-approver:
post:
operationId: getRequestApprover
description: Get a request approver
requestBody:
required: true
content:
application/json:
schema:
type: object
required:
- project_requester
properties:
project_requester:
type: string
responses:
'200':
description: Request Approver
content:
application/json:
schema:
type: object
required:
- request_approver
properties:
request_approver:
type: string
- Similarly, a tool called ‘request-submit’ has been configured to submit the request when all inputs are collected by the AI agent. Below is a sample of the request that is submitted (as defined by the endpoint’s OpenAPI specification).
{
"project_name": "AI Chatbot",
"request_approver": "approver@acme.com",
"business_owner": "Peter",
"technical_owner": "Bob",
"project_requester": "Peter"
}
- Once the request has been submitted, the ‘project-request’ integration is triggered. We configured this in the blog Adding self-service capabilities to your Landing Zone and are triggering it now also from the Conversational Agent. For our use case, the approval-based version is used: when the control reaches the Approval task, execution is halted, and all tasks after the Approval task are suspended. The integration resumes the execution only when a user manually approves or rejects the approval request.
Get the request status
Expanding the Cloud Operations AI agent, we will now add the request status functionality.
- Create a new playbook ‘Request Status and define the instructions to follow.
- We will use the project name as the input for checking the status.
- Step 1: collect the project name for which to check the status and use use ${TOOL:request-status} to get the current status
- Step 2: provide feedback on the request status using the output from the tool
- To perform the status lookup, we configure another tool called ‘request-status’ that returns the current status: ongoing, completed or unknown. The AI agent will provide this feedback to the user.
{
"request_status": "completed"
}
Envisioning a future-ready foundation
How would a target state architecture that fully leverages the power of AI for an intelligent and self-service cloud operations platform? The architecture, depicted in the image below, represents an actionable blueprint to achieve this.
The foundation focuses on the following key requirements:
- Natural Language Interface: Users needed to interact with the system using natural language, enabling intuitive requests like “Create a new project” or “Request a status update.”. This is enabled by the conversational agent as the primary entry point.
- Backend Integration: During conversations, the solution will need to connect to various backends to gather additional information, facilitated by a data/state management layer and flexible connectors.
- Data Transformation: User and system interactions may need to be transformed for seamless processing. Generative capabilities will facilitate this and can be called upon when needed.
- Workflow Triggering: The system can trigger workflows that connect to new and existing pipelines to deliver the required functionalities. The control plane serves as the central hub, managing state and executing actions. It will integrate with Application Integration for easily and quickly implementing workflows.
Applying this the demo above, it highlights the seamless transition from the conversational agent to Application Integration. Users can initiate requests like “Create Project” or “Get Status,” and the system intelligently routes these requests to the appropriate workflows.
- Natural Language Input: Users interact with the conversational agent using natural language.
- Intent Recognition: The agent recognizes the user’s intent (“Create Project” or “Get Status”).
- Workflow Triggering: The agent triggers the corresponding Application Integration workflow (“project-request”).
- Backend Interaction: The system interacts with backend systems to retrieve status updates or approver information.
- Workflow Execution: Application Integration executes the necessary workflows, including approvals and notifications.
Getting started
This architecture represents a vision for a future-ready, AI-driven cloud operations platform that will set a new standard for . Embracing intelligence and automation will empower teams to focus on innovation and drive business value rather than worrying about operations. and will set a new standard for self-service in the cloud.
- Start Small and Iterate: Begin with a focused proof of concept and gradually expand the AI agent’s capabilities.
- Prioritize Data Management: High-quality metadata is essential for the AI agent’s performance.
- Focus on User Experience: A conversational interface and personalized assistance are crucial for user adoption. However, carefully consider complimentary user interactions (forms, smart actions, visuals, etc.) to avoid turning everything into a chat.
- Embrace Continuous Learning and Improvement: Implement a feedback mechanism and continuously refine the AI agent.
Expanding the AI agent role
With the foundational capabilities in place, it is possible to start realizing the potential far beyond project requests. Some ideas to integrate the AI agent into other critical areas of your cloud operations include:
- Troubleshooting & Support: The AI can deliver a first line of support, answering common questions and guiding users through troubleshooting steps.
- Resource Management: The agent can manage resource provisioning and de-provisioning. Users can request new VMs or storage buckets through natural language commands.
- Cost Optimization (FinOps): Integrating the agent with cost monitoring tools. Users could now ask questions like, “What are my top spending projects?” or “How can I reduce my BigQuery costs?” The agent can analyze data and provide actionable insights, such as defining a project budget on the user’s behalf.
- Security Compliance: The agent can understand security policies and best practices. Users could ask, “Is my project compliant with the organizational policies?” or “How do I implement least privilege access?” The agent would provide relevant documentation and guidance.
- …