Stories by Elangovan Sivalingam on Medium

Building Agents for the Enterprise

Elangovan Sivalingam — Sat, 16 Aug 2025 23:51:48 GMT

Part 3: Agent Architecutre

If you’ve been following along since Part 1, we started by making sense of what “agents” actually are and we build a simple Web Automation Agent.

In Part 2, we rolled up our sleeves, compared some of the top frameworks, and even built a small example using Semantic Kernal to see things in action.

Now in Part 3, we’re taking a step back to look under the hood. How do these agents actually work? What’s their anatomy? And how does the design change depending on whether you’re using a ready made declarative setup or building your own multi agent system from scratch?

The Core Components of Any Agent

At the heart of any agent, whether it’s Microsoft Copilot or your own homegrown system, you’ll find a few core building blocks:

Orchestrator — Think of this as the Air Traffic Controller.
Just like controllers guide dozens of planes safely and efficiently, the orchestrator manages how your agent operates. It decides which knowledge to draw from, which action to take, and in what order ensuring everything runs smoothly without conflicts or confusion.
Knowledge — This is the agent’s “brain library.” It’s where you load specialized data, instructions, or context so it responds in a way that fits your domain.
Actions — The “hands” of the agent. These are the triggers, workflows, or API calls it can execute to get stuff done.
Foundation Models — The actual intelligence. These models (like GPT-5, Claude, or Llama) handle reasoning, understanding, and generating responses.
User Experience Layer — The “face” of the agent. Whether it’s inside Microsoft Teams, Outlook, a web dashboard, or your custom UI, this is how users interact with it.

Everything else, memory systems, MCP tool integrations, APIs plugs into these layers to make the agent more capable, reliable, and context-aware.

Architecture #1: Declarative Agents (Microsoft Copilot style)

This approach is like building with LEGO kits. You have predefined blocks, clear instructions, and the architecture is designed so you focus on what the agent should do, not how it’s coded.

You define the knowledge sources (SharePoint, internal DBs, APIs).
You set up actions (business process workflows, API connectors).
The orchestrator and foundation model layers are handled by Microsoft’s Copilot ecosystem.
You plug it into the UX layer (Teams, Word, Excel) and you’re ready to go.

It’s built for speed, security, and seamless integration. Ideal if your use case lives inside the Microsoft 365 ecosystem. For example, if you’re building an internal support bot for your organization, this architecture is a natural fit without the need to reinvent the wheel.

(A simplified architecture diagram inspired by the Microsoft AI Agents.)

Architecture #2: Custom / Multi-Agent Systems

This is where you go full DIY building a custom agent. The diagram below is a great example, every piece is yours to choose, wire, and scale.

A typical setup might have:

User Input Layer — APIs, chat UIs, or voice input.
Input Processing — Natural language parsing, validation.
Reasoning Engine — LLM with planning loops (ReAct, AutoGen, etc.).
Memory Systems — Short-term (cache) + long-term (vector DB).
Tool/Skill Orchestration — APIs for weather, databases, search, communication tools.
Response Generation — Formatting answers for your output channels.
Infrastructure — Deployment, scaling, monitoring, and security.

Here you get total control. You can chain multiple agents together, integrate custom reasoning, or connect to any tool you like. The trade-off? More flexibility means more moving parts to manage.

Custom/Multi Agent Architecture

When to choose this approach

You need custom reasoning or multi‑step workflows beyond a single prompt (planning, tool use, delegation).
You must integrate deeply with proprietary data, tools, or on‑prem systems.
Hard requirements for security, compliance, data residency, or air‑gapped/offline modes.
You want to optimize latency, cost, or accuracy at a component level (e.g., hybrid models, caching, partial fine‑tunes).
You expect multiple specialized agents (retriever, planner, executor, reviewer) with clear handoffs.
You need strong observability, evals, and guardrails tailored to your domain.
Product roadmap demands flexibility: rapid tool additions, model swaps, or policy changes.

Key benefits

Full control: pick models, memory, tools, and governance; swap anything without vendor lock‑in.
Extensibility: compose multiple agents, custom planners/critics, and domain‑specific skills.
Performance and cost: aggressive caching, hybrid model routing, streaming, and resource isolation.
Reliability: explicit timeouts, retries, circuit breakers, and fallbacks per step.
Compliance and security: fine‑grained data flows, PII handling, audit trails, and private deployments.
Better fit for complex enterprise use cases: multi‑tenant, RBAC, approval flows, and SLAs.

Trade‑offs

Higher engineering lift: architecture, orchestration, testing, and MLOps/LLMOps.
Operational overhead: monitoring, drift/eval pipelines, incident response, and change control.
More moving parts: versioning prompts/tools, dependency management, and data contracts.

Choose a custom or multi‑agent setup when you need more control and flexibility than off‑the‑shelf tools. You’ll spend more time on engineering and operations, but the benefits can be worth it.

Responsible AI (RAI) Validation: Why it Matters?

Building agents is exciting, however there’s one thing to keep top of mind especially in enterprise: Responsible AI (RAI). RAI means making sure your agent behaves safely, ethically, and reliably with real users.

If you’re rolling out in Microsoft 365 or Copilot‑style environments, many safety checks are built in. According to Microsoft, agents are validated for areas like

Prompt safety and potential “jailbreak” attempts
Offensive or harmful content generation
Bias, stereotypes, or manipulative behavior

If the agent fails these checks, whether during publishing or in-chat, it won’t get deployed. That means you need to think through safety and fairness before you go live.

With Copilot style agents, the framework does most of the safety heavy lifting for you: prompt checks, content filters, and policy enforcement are baked in. But if you’re building a custom agent, you have to handle this yourself: define your safety rules, test for attacks, add guardrails (like moderation, RBAC, rate limits), and monitor in production.

Short version: Copilot covers it; custom means you own it by design, in CI, and at runtime. I’ll dive into the full playbook in a future article.

Why This Matters in the Real World

A recent survey shows 78% of business leaders consider RAI a growth enabler, but only 2% of companies are actually meeting proper RAI standards. The failure to address this isn’t theoretical, it’s resulting in real financial loss and reputational harm.Newswire
Especially with agents that act autonomously or touch sensitive business systems, you need to embed safety by design not bolt it on at the end.

So, in the enterprise setting, RAI isn’t an optional mid-step. It’s a gatekeeper. It protects users, your brand, and avoids compliance landmines.

Finally Human Still Matters (For Now): Why Responsible Agents Need Real People

Even the smartest agents aren’t ready to fly solo at least, not at first. Human oversight is essential for building trust, catching edge cases, and gradually handing off control.

Why Keep Humans in the Loop

Most folks are comfortable collaborating with AI, but far fewer want to be managed by it. In short: agents are teammates, not replacements yet.

A practical pattern:

* Pre‑processing: set limits and data context before the agent acts.
* Human‑in‑the‑loop: pause for confirmation on high‑risk steps.
* Post‑processing: humans review before final delivery.
* Return of Control (ROC): humans can edit or override actions pre‑execution

HITL isn’t forever. Start with more checks, then scale back as confidence rises.

A Few Simple HITL Patterns to Keep in Mind

Confirmation loops: agent flags decisions; humans approve.
Review queues: sample outputs for regular human audits.
Escalation paths: clear rules for when to hand off to a human.
Feedback loops: human corrections flow back to improve the system.

Why This Matters in Enterprise Settings

Builds user trust. Teams feel more confident when they know there’s a human safety net.
Prevents potential damage. In sensitive domains like finance, HR, health care or customer support, bots must default to safest options.
Smooth transition to autonomy. Start human heavy, then relax oversight as the system matures. That’s how autonomy evolves responsibly.

Phase             Oversight Level     Typical Flow
--------------------------------------------------------------------------------------
Initial Stage     100% human review   Agent suggests & human approves
Stabilization     50% human check     Human reviews edge cases or low confidence outputs
Mature Stage      Spot checks only    Agent runs autonomously with fallback escalation

Final Thoughts:

When you’re designing your agent, whether it’s the declarative Copilot kind or the custom multi-agent system, make sure RAI considerations are part of your plan:

Think through harmful edge cases while designing prompts and actions.
Test with diverse inputs to check for bias or misuse.
Have a validation loop or checklist before deployment or updates.

In the next part of this series, we’ll be going deeper into real world architecture, how to structure MCP servers, build modular agents, and keep things robust and secure. RAI will feature there too, because safety doesn’t stop at deployment. Leave your thougths in the comments below.

Building Agents for the Enterprise

Elangovan Sivalingam — Mon, 04 Aug 2025 12:44:25 GMT

Part 2 : Agentic Frameworks

Agentic Frameworks

Introduction: Why Agentic Frameworks Matter?

In Part 1 of this series, we explored the shift from traditional automation to intelligent agents, systems that don’t just execute instructions, but reason, plan, and interact. These agents represent a new class of enterprise capability that can actively navigate APIs, orchestrate tools, and maintain context across conversations and tasks.

But building an agent isn’t as simple as calling an LLM. It requires a framework, a foundational system that handles planning, memory, orchestration, integration, and communication. This is where agentic frameworks come into play.

Whether you’re experimenting with copilots for internal teams or building robust multi agent systems that interact with other agents, tools and databases, choosing the right framework can be the difference between a prototype and a production-ready system.

In this article, we’ll break down:

What agentic frameworks are and why they’re needed
The difference between low-code and pro-code frameworks (and how to choose)
A curated list of top agentic frameworks to explore
A hands-on walkthrough using Microsoft’s Semantic Kernel
Enterprise architecture consideration such as memory, orchestration, RAI, and more

Let’s dive in and explore the architectural foundation behind truly useful, reliable, and scalable AI agents.

Agentic Frameworks — The Brains Behind the Agent

So what exactly is an agentic framework?

Think of it as the operating system for your AI agent. It brings order to the chaos, helping your agent plan, use tools, remember context, and talk to other systems. Without a framework, you’re either writing tons of glue code yourself or building fragile one-off bots that don’t scale.

The good ones take care of the heavy lifting:

Handling tool orchestration so your LLM can call real APIs
Managing short-term and long-term memory
Supporting multi-step reasoning and chaining
Providing interfaces for user input, context injection, and response formatting
Integrating with vector stores, databases, external APIs, or even other agents

In other words, agentic frameworks aren’t just wrappers around LLMs, they’re the engine under the hood that powers planning, memory, and tool use so your agent can actually do useful work.

Some frameworks are optimized for quick builds and business users (low-code), while others go deep into customizability and scale (pro-code). In the enterprise world, you’ll likely need to experiment with both especially if you’re building agents that move beyond one task or one department.

Before we dive into low-code vs pro-code, let me give you a lay of the land with some of the top frameworks out there today.

Agentic Frameworks Worth Exploring

Here’s a quick rundown of some of the top frameworks I explored. This space is growing fast, and each framework brings its own strengths, trade-offs, and sweet spots. Whether you’re building a chatbot, a research assistant, a task orchestrator, or something more complex, the framework you choose can make a big difference.

+-------------------------------+---------------+--------------------------------------------------------+-----------------------------------------------+
| Framework                     | Language(s)   | What Its Good At                                       | Ideal Use Case                                |
+-------------------------------+---------------+--------------------------------------------------------+-----------------------------------------------+
| LangChain                     | Python        | Tool calling, memory, agents, chains, rich ecosystem   | Custom agent pipelines, integration-heavy apps|
| Semantic Kernel (SK)          | C#, Python    | Planning, skill plugins, memory, MS-native             | Copilots, Microsoft 365 EcoSystem             |
| OpenAI Agents / Assistants API| API / JSON    | Tool calling, function execution, managed context      | Fast, serverless agents via OpenAI platform   |
| Microsoft Copilot Extensibility| SK + Low-code| Graph-integrated, Teams-ready, RAI validation          | M365 Copilots, enterprise-ready assistants    |
| CrewAI                        | Python        | Role-based multi-agent collaboration                   | Team-style agents and workflows               |
+-------------------------------+---------------+--------------------------------------------------------+-----------------------------------------------+

Quick Note:

If you’re just starting out, LangChain or Semantic Kernel are great hands-on options with strong documentation and community support.
If you’re building something inside the Microsoft ecosystem, the Copilot extensibility model (Copilot Studio Agentg) gives you native enterprise hooks right out of the box.

Low-Code vs Pro-Code — How to Choose the Right Track

Not all agent frameworks are built the same and that’s actually a good thing.

Some are designed for speed, simplicity, and business users. Others give engineers full control to customize logic, scale across systems, and deeply integrate into existing stacks.

You’ll often hear these categorized as low-code and pro-code frameworks. But this isn’t a binary decision, think of it more like a spectrum. And in many enterprise environments, you might end up using both depending on the team or use case.

Here’s how they generally stack up:

+-------------------------+-------------------------------------------+-----------------------------------------------+
| Criteria                | Low-Code Frameworks                       | Pro-Code Frameworks                           |
+-------------------------+-------------------------------------------+-----------------------------------------------+
| Who its for             | Business users, Analysts or Anyone        | Developers, Engineers, Architects             |
| Learning curve          | Easy, visual builders                     | Steeper, requires programming knowledge       |
| Speed to prototype      | Fast                                      | Moderate to slow (setup time needed)          |
| Custom logic            | Limited                                   | Full flexibility with code                    |
| Integration depth       | Plug-and-play connectors                  | API-level access, deep integration            |
| Observability & testing | Basic or missing                          | Debuggable, testable, traceable               |
| DevOps friendly         | Not really                                | CI/CD, GitOps, container-ready                |
| Scalability             | Good for teams or small use cases         | Designed for production-grade agents          |
| Best for                | Demos, simple bots, internal workflows    | Multi-agent systems, tools, APIs, pipelines   |
+-------------------------+------------------------------------------+-----------------------------------------------+

How I Think About Choosing

Just getting started? Try a low-code tool like Copilot Studio or FlowiseAI to get a feel for how agents interact with tools and users.
Need control, versioning, Multi agent interaction or custom APIs? Go with LangChain, Semantic Kernel, or CrewAI these will give you the flexibility to go deep and build production-ready agents.
In a Microsoft shop? Use the Copilot extensibility model (AI Toolkit, M365 Agent Kit, Copilot Studio), business users can build flows visually, and developers can extend with Semantic Kernel skills behind the scenes.
Don’t overcommit early experiment with both. Low-code tools are great for ideation, but most serious agent systems will eventually need pro-code logic under the hood.

Decision Tree

Start Here
   |
   +--> Are you technical or non-technical?
           |
           +--> Non-technical or prefer visual tools?/integrate with existing MCP Servers?
           |       |
           |       +--> Need something fast for prototyping or internal use?
           |       |       |
           |       |       +--> Go with a Low-Code Framework (e.g., Copilot Studio, FlowiseAI)
           |       |
           |       +--> Need deeper logic or integrations? or Build MCP Servers?
           |               |
           |               +--> You’ll need support from developers using Pro-Code Frameworks
           |
           +--> Technical / comfortable writing code?
                   |
                   +--> Is your use case simple or experimental?
                   |       |
                   |       +--> You could try either — Low-Code for speed, Pro-Code for flexibility
                   |
                   +--> Need advanced planning, tool use, or integration with APIs, DBs, or memory? 
                           |
                           +--> Go with a Pro-Code Framework (e.g., LangChain, Semantic Kernel, CrewAI)

Hands-on: Build a Simple Agent Using Semantic Kernel

One of the cleanest frameworks I’ve used is Semantic Kernel (SK), Microsoft’s open-source library for building copilots and intelligent agents. It comes with key features like Planners, Memory, and Tool (Skill) integration, all set up to work together right from the start. SK works well with both C# and Python and fits perfectly into the Microsoft Copilot ecosystem. You can quickly create planners, use tools (called “skills”), and manage memory without a lot of setup. The best part? You don’t need to build a whole app just to test something simple.

Lets build a simple agent that gets the weather and suggests what to wear, just enough to demonstrate tool calling + planning.

Step 1: Install Semantic Kernel (Python)

pip install semantic-kernel

Step 2: Sample Agent Code

import semantic_kernel as sk
from semantic_kernel.planning import SequentialPlanner
from semantic_kernel.skill_definition import sk_function

# Step 1: Create Kernel and Planner
kernel = sk.Kernel()
planner = SequentialPlanner(kernel)

# Step 2: Define some mock "skills"
@sk_function
def get_weather(city: str) -> str:
    return f"Simulated weather for {city}: 24°C, sunny"

@sk_function
def suggest_outfit(weather: str) -> str:
    return f"Based on '{weather}', wear something light and breathable."

# Step 3: Register your skills
kernel.import_skill(get_weather, "WeatherSkill")
kernel.import_skill(suggest_outfit, "OutfitSkill")

# Step 4: Let the planner build a plan based on natural language
plan = planner.create_plan("Get weather in Toronto and suggest what to wear")
result = plan.invoke()

# Step 5: Print the output
print(result)

So what happened here?

You’re registering functions as skills (this is how you plug in tools/APIs).
The planner takes your goal (“Get weather and suggest outfit”) and figures out what steps to run.
You get an autonomous agent loop. No hardcoding logic, just a plan based on intent.

Try This Next

Replace the mock get_weather() with a real weather API (e.g., OpenWeatherMap).
Add memory using a vector DB like Pinecone or Redis.
Chain it with another skill like sending an email with the result.

Wrapping Up

Alright, that was a lot. We looked at what agentic frameworks really are, compared low-code vs pro-code options, walked through some of the top tools out there, and even built a small working agent with Semantic Kernel.

Hopefully now you’ve got a better feel for what’s Agentic frameworks and when to use what. Whether you’re hacking together a quick idea or building something enterprise-grade.

Before we dive into the next part, here’s a small challenge: pick one of the frameworks, build an simple agent, and see how it feels. Doesn’t have to be fancy even something like a task planner or a weather bot. And if you do build one, I’d love to hear about it. Tag me or drop a link, always excited to see what others are building.

In Part 3 of my series, we’ll go deeper into how these agents are wired under the hood. We’ll break down the architecture differences between low-code and pro-code setups, and explore something that’s becoming super important Model Context Protocol (MCP).

MCP is how agents talk to the outside world in a structured, reliable way and we’ll even look at how to build your own MCP server to level up your agent’s skills.

See you in Part 3 it’s going to be fun.

Building Agents for the Enterprise

Elangovan Sivalingam — Thu, 24 Jul 2025 02:45:08 GMT

Part 1 — Build your first web/test automationAgent under 10 minutes

Introduction

By now, most of us have either heard about or even used AI Agents and seen their potential to disrupt industries of all kinds. From simple automation to complex business workflows, agents are opening up entirely new ways to get work done autonomously, with less repetitive effort and more intelligent decision-making built in.

In this series, I want to share my learnings, walk you through the practical steps to build your own agents step by step, and share key principles for designing enterprise-grade agents that are secure, reliable, and actually make life easier for teams.

I promise to keep things as simple and hands-on as possible. So to kick things off, let’s build a quick web/test automation agent in under 10 minutes, all by using natural language instructions with Playwright and the GitHub AI Agent Toolkit. By the end, you’ll have a working example you can tweak, extend, or plug into bigger workflows.

And this is just the start. In the coming weeks, I’ll dive deeper into how to take these agents from side projects to production-ready tools that handle real workloads safely and efficiently. Here’s a glimpse of what’s coming up next in this series:

Building Enterprise-Grade Agents (What’s Next)

Upcoming Topics:

Agentic Framework — How to pick the right design for your use case
Architecture — Patterns for reliability and fault tolerance
Model Context Protocol — Build/Integrate with MCP servers
Writing Effective & Reliable Prompts
Single-Agent vs. Multi-Agent Orchestration
Adaptability — Making your agents easy to reuse, replace, and extend
Real-world Deployment — CI/CD for agents, versioning, and governance
Cost Optimization — Keeping your agent runs efficient and affordable
Governance & Security — Managing secrets, IAM, sandboxing, and safe execution
Performance — Avoiding costly calls, model performnce and sluggish workflows
Monitoring & Observability — Logging, tracing, and operational insights

Let’s Build Your First Agent

Alright, time to roll up our sleeves. To keep things real and practical, let’s build a simple web/test automation agent (Stock Analysis Agent) that does something useful:

Open a browser
Search for stock price (like AAPL)
Read the stock price
Decide if it’s a Buy or Sell based on the price
Format the results and display it in a table

And the best part? We’ll do this with natural language instructions, no hardcoded click selectors, no coding. We’ll use the GitHub AI Toolkit + Playwright + a smart LLM (Claude, in my case) to run this as an autonomous agent.

You can try this with a different LLM like GPT4.1 or Gemini 2.5 Flash.

Prerequisites

Before you run your first agent, make sure you have the following ready:

Install the latest version of Visual Studio Code
Access to Copilot. Copilot Free plan and get a monthly limit of completions and chat interactions.
VS Code Extenstions — Install both the GitHub Copilot extension and the AI Toolkit extension.
Playwright MCP Server — We’ll use Microsoft’s Playwright MCP server, so you don’t need to install Playwright separately.
LLM & API Key — You’ll need access to a Large Language Model . For this example, I’m using Claude. Log in to Anthropic Console, create an API key, and save it safely — you’ll need it later.

Agent Design

AI Agent Tool Kit

Launch VS code and AI tool kit, refer the image below for reference

AI Tool Kit

Model Catalog

The Model Catalog in the VS Code AI Agent Toolkit is a handy feature that helps you manage which AI models your agent can use right inside your IDE.

With the Model Catalog, you can:

Browse available models → see which providers and versions you can plug into your agent.
Configure API keys and endpoints →no need to hardcode them into your scripts.
Switch models easily → test your agent with GPT, then try Claude or another provider.
Manage credentials securely →VS Code handles the secret storage so you don’t leak keys in your repo.

In practice, it makes it super simple to point your agent to the right model. For this example, i’ve added Claude 3.7 Sonnet. You may choose the model of your choice however to keep it simple pick Claude 3.7.

Model Catalog

Agent Builder

The Agent Builder is one of the key features of the VS Code AI Toolkit. It’s like a visual workspace for designing, testing, and running your AI agents right inside VS Code.

In simple terms, the Agent Builder lets you:
• Create an agent project step by step, pick a model, add tools, and define instructions.
• Describe what your agent should do in natural language.
• Connect tools like Playwright or other MCP servers
• Run and test your agent in the same place you code, no need to switch to the terminal or browser.
• Iterate quickly, tweak prompts, add capabilities, and see the results instantly.

Think of it as a no-fuss control panel for turning your idea into a working AI-powered helper, whether that’s a test automation bot, a data scraper, or something more advanced.

Model → Under the model selcet “Claude 3.7 Sonnett or the model you added in the previous step”.
Prompts → Use the attached system prompt and user prompt in the article

SYTEM Prompt

You are a web automation agent specializing in navigating web pages, scanning content, and taking action. Your task is to retrieve stock prices from Google search and provide buy/sell signals based on price thresholds.

# Steps
1. Open the default web browser and navigate to Google.com
2. Search for the stock price mentioned in the user's prompt
3. Extract the current stock price from the search results
4. Determine the buy/sell signal based on the price threshold ($200)
5. Present the results in a formatted table

# Tool Use Guidelines
To complete this task, I will use the following browser tools:
- Use browser_navigate to go to Google.com
- Use browser_type to enter the stock search query
- Use browser_click to submit the search
- Use browser_snapshot to capture the page content with stock price information
- Use browser_navigate or browser_close as needed to complete the task

# Output Format
I will provide:
1. A brief summary of the actions taken
2. The stock information in a markdown table with the following columns:
   - Stock Name
   - Stock Price
   - Signal (Buy if price < $200, Sell if price ≥ $200)

Example format:
```
| Stock Name | Stock Price | Signal |
|------------|-------------|--------|
| MSFT       | $230        | Sell   |
```

# Examples
Input: Check the stock price for AAPL
Output:
I searched for AAPL stock price on Google and found the current price is $198. Since this is less than $200, the signal is Buy.

| Stock Name | Stock Price | Signal |
|------------|-------------|--------|
| AAPL       | $198        | Buy    |

# Notes
- I will extract the stock price from Google search results without navigating to any financial websites
- The buy/sell signal is determined solely based on the $200 threshold (Buy if < $200, Sell if ≥ $200)
- Stock prices may fluctuate, so the information provided reflects the data available at the time of the search

User Prompt

NVDA stock analysis.

Tools → Tools are the actions or external capabilities your agent can call like browsing a webpage, running a test, or fetching data. Let’s integrate the built in Playwrigth MCP server with our agent.

Add Tools -> Select MCP Server

Add Server

Featured MCP Server

Select Playright & enter the servername

Select all tools and click ok

Execution

Now that all the configuration is done, you’re ready to run your first agent and see it in action!

What to Expect

You built a agent that:

Uses Playwright via MCP (Model Context Protocol).
Connects to Anthropic’s Claude model.
Navigates the web (e.g., Google Finance).
Extracts stock price.
Makes a decision: “Buy or Sell.”

We didn’t write any programming logic, use HTML objects, or manually handle web page coordinates, so how did this work? Your agent uses the AI model (Claude 3.7) to plan the browsing steps, talk to Playwright for actions like navigate, search, click, extract text, and validate the condition returns the final decision. Checkout the output from the Model reponse section for detailed output.

Model Response

Over to you

Try it out, with your usecase:

Expand the idea to your imagination.
Try with different models
Add your own prompts, change the stock, the threshold, or the output format.
Use different tools (Screenshots, Webscrappers, etc..)
Run the scritp and watch your first autonomous test agent at work!

Next, I’ll break down how this ties into bigger agentic patterns from chaining tasks to connecting multiple tools. But for now, congrats! You just built your first tiny web/test automation agent with Playwright.

Ready to see what’s possible when you take this idea to production? Stay tuned for the next piece in the series and I’d love to hear about your experience in the comments below!

How a Simple Null Pointer Exception Bug Triggered the Biggest Cloud Crash of the Year

Elangovan Sivalingam — Wed, 18 Jun 2025 11:56:28 GMT

A Deep Dive into One of the Most Pervasive Cloud Failures and What It Teaches SREs and Architects

On June 12, 2025, a null pointer exception in Google Cloud’s Service Control system led to a cascading outage affecting 50+ services globally. The error, triggered by a malformed policy update, was rapidly replicated through Spanner, Google’s globally-distributed database. Key issues included lack of a feature flag, missing input validation, and poor isolation of observability tools. This incident offers critical lessons in fault isolation, graceful failure, and resilient architecture.

Introduction

On June 12, 2025, the internet faltered in a way that shook the very foundation of cloud reliability. A wave of outages began with intermittent issues on popular platforms like Gmail and Spotify. Within the hour, the problem had cascaded across continents. Developer tools stalled, authentication services failed, and even Google’s own observability infrastructure went dark.

The root cause? A hidden vulnerability inside Google Cloud’s Service Control system triggered by a single malformed policy update.

This article breaks down what happened, how it unfolded, and most importantly, what we as SREs and cloud architects must learn to avoid a similar fate.

What Failed?

At the heart of the outage was Google Cloud’s Service Control, a critical system that acts as the guardian of Google Cloud’s API infrastructure. Every API request that flows through the cloud passes through Service Control. Its job is to:

Verify authorization for API calls
Enforce quota and rate limits
Apply organization and project-level policies
Perform logging, auditing, and billing tasks

Service Control (Source- Google Cloud)

When Service Control fails, everything that depends on it grinds to a halt. That’s exactly what unfolded.

The trouble started with a new feature released into Service Control on May 29, 2025. The feature introduced a more advanced quota policy evaluation system. But it came with a dangerous flaw: it lacked a feature flag. The new code was live in production and quietly waiting for the right (or wrong) input to trigger it.

That moment arrived on June 12, when a malformed policy update entered the system. The policy included blank fields an unexpected structure that wasn’t accounted for in the new logic path.

Here’s where the critical bug lay: there was no null check in the code path introduced by the new feature. As the system processed the malformed policy, it attempted to access a value from a field that didn’t exist. This caused a null pointer exception a crash-level error that immediately terminated the Service Control process in that region.

Crucially, this wasn’t just an isolated crash. Because the policy metadata is stored and managed in Google’s globally-distributed Spanner database, the malformed policy was instantly replicated to all other regions. Each regional Service Control instance began to process the same faulty data and each one crashed in turn.

Within minutes, over 50 services across 40+ regions were returning HTTP 503 errors. The null pointer bug, tiny in code size but massive in impact, had cascaded into one of the largest global cloud outages of the year.

A Closer Look at Spanner

Spanner is Google’s globally-distributed, horizontally scalable relational database. It’s one of the engineering marvels behind Google Cloud’s promise of high availability and consistency across regions.

Spanner ensures:

Synchronous replication of metadata and configuration across regions
Global consistency through a distributed TrueTime API
High fault tolerance and uptime

Spanner Server Organization (Source — Spanner Documentation)

While Spanner did exactly what it was supposed to replicating policy updates globally, it did so without any built-in safeguards to check for schema or structural validity in the new data. There was no circuit breaker or quarantine to isolate malformed updates before they spread.

In this incident, that lack of validation allowed a single corrupt policy object to poison Service Control instances worldwide. What made Spanner powerful also made it dangerous without additional safeguards in place.

The Domino Effect: Timeline of the Outage

[May 29, 2025]  New quota logic deployed without afeature flag
[Jun 12, 10:45 AM PDT]  Malformed policy update pushed to Spanner
[Jun 12, 10:51 AM PDT]  First Service Control crashes begin
[Jun 12, 10:52–11:00 AM]  Global crash cascade via replicated data
[Jun 12, 11:10 AM]  50+ services including Gmail and Vertex AI fail
[Jun 12, 11:30 AM]  Google SREs trigger red button bypass
[Jun 12, 12:30 PM]  Most regions stabilize
[Jun 12, 1:30 PM]  Final recovery in us-central1

Key Engineering Failures

No Feature Flag: The new quota logic was pushed live across all binaries without the ability to toggle it off regionally.
Insufficient Input Validation: A missing null check turned a malformed policy into a crash trigger.
Aggressive Global Replication: Spanner’s replication worked perfectly which meant the bug was replicated perfectly.
No Backoff on Restart: In us-central1, the lack of exponential backoff caused a flood of retries that overwhelmed Spanner.
Monitoring Outage: Google’s own observability tools and health dashboards were hosted on the same infrastructure and failed along with everything else.

Lessons for SREs and Architects

Always Use Feature Flags for Critical Logic
Never release backend features without the ability to enable/disable them in isolation.
Validate Before Replication
Build gating mechanisms to prevent malformed configurations from spreading across environments.
Implement Fail-Safe Defaults
Code defensively. Assume bad input will happen.
Design Recovery Mechanisms
Use randomized backoff and load throttling during restarts to avoid secondary failures.
Isolate Observability Systems
Don’t host monitoring tools on the same infrastructure they are meant to observe.

Conclusion

The June 2025 Google Cloud outage reminds us that even the most sophisticated infrastructures can fail and often, the weakest links are the assumptions we make about safety. Spanner did its job. So did Service Control. But together, without the right safeguards, they created a perfect storm.

For SREs and architects, this is a case study not just in failure, but in foresight. Resilience isn’t just about uptime, it’s about limiting the blast radius when things do go wrong.

We build cloud-native systems to be fast, scalable, and elastic. But above all, we must build them to fail gracefully. Open to hear your thoughts and the takeaways from your perspective.

References

Overview | Service Infrastructure | Google Cloud

Spanner Distributed Database

Google Cloud Service Health — Incident Report

Buy vs. Build in Observability: Step 2 – Avoiding the Licensing Trap

Elangovan Sivalingam — Wed, 21 May 2025 22:39:36 GMT

Licensing Pitfalls

“It looked affordable… until our usage scaled.”

This is a sentence I’ve heard more than once from engineering leaders and platform teams burned by unexpected software costs especially in the realm of observability.

In Step 1 of this series [Build Vs Buy In Observability], we explored the broader dilemma of buying versus building observability tools. Now, in Step 2, let’s dig into one of the most common (and costly) traps when buying: licensing.

The Hidden Costs Behind a Slick Demo

When we evaluate observability tools, we usually start with a demo, a trial, or a limited PoC. Pricing might look manageable at first, often even compelling. But as your usage grows, that seemingly fair pricing model can become an anchor on your budget and flexibility.

The book Observability Engineering puts it plainly:

“It’s not unfair to charge for software. What’s unfair is hiding what it’s going to cost when you actually use it for its intended purpose.”

That’s the core issue. Licensing models that obscure Total Cost of Ownership (TCO) or penalize successful adoption can turn an observability success story into a financial headache.

Common Licensing Pitfalls (and How to Dodge Them)

1. Pricing Models That Don’t Scale With You

Many vendors use pricing based on:

Per seat
Per host or container
Per query
Per service
Per GB of ingested data

Individually, these might seem reasonable. But together, they create a fog of unpredictability. If you're doing observability right, instrumenting broadly, querying often, and sharing access your usage will grow. Fast.

2. The Trap of Long-Term Contracts

One of the most overlooked licensing pitfalls is signing multi-year contracts with the promise of cost savings — only to find yourself stuck with tools that can’t keep pace with evolving tech landscapes.

Take AppDynamics and Splunk, for example:

AppDynamics, once a leader in monitoring traditional applications, now struggles to support cloud-native environments. Cisco’s own pivot to different tools like Splunk Observability Cloud and the halt in development of Cisco Cloud Observability has left customers fragmented and forced to stitch together multiple tools to maintain visibility.
Splunk, primarily known for log management, has pushed into observability — but customers report it’s immature, hard to scale, and lacks topology awareness. Integration is often manual, and managing it becomes a burden as observability needs grow.

The result? Organizations locked into long-term deals are now forced to either double down on aging solutions or pay again to modernize. Always evaluate how well a tool scales with your future, not just your present.

3. The Pay-As-You-Go Mirage

Flexible pricing sounds nice. But if you can’t predict your future usage (and who can, really?), you risk locking yourself into an exponentially growing budget.

4. Vendor Lock-In via Proprietary Instrumentation

Some tools offer quick setup via proprietary agents. Fast? Yes. Future-proof? No.

Migrating later often means re-instrumenting from scratch, which is time-consuming and error-prone. That’s why many teams are moving toward OpenTelemetry a vendor-agnostic standard that gives you flexibility without sacrificing speed.

5. Penalizing Curiosity

The more observability becomes a shared asset across your org, the more value you unlock but also the more you pay if the pricing model isn’t aligned.

“You will want to slice and dice your observability data in many ways… you should see exponential growth in data queries as a culture of observability spreads.”

If your vendor charges per query or user, you might find yourself discouraging adoption to avoid cost spikes. That’s backwards.

Smart Strategies to Avoid Licensing Regret

Here’s how thoughtful engineering and business leaders approach licensing decisions in observability:

1. Demand Transparency

Ask vendors to model pricing across realistic future scenarios:

What happens when your team doubles?
What if usage increases by 10x?
How does pricing change if other teams start querying data?

If they hesitate, that’s a red flag.

2. Start with OpenTelemetry

By defaulting to OpenTelemetry (OTel), you reduce long-term risk. OTel lets you switch backends without ripping out your instrumentation. Use vendor distros if you must, but instrument natively in OTel wherever possible.

3. Avoid Licensing Models That Penalize Success

Curiosity should be rewarded, not punished. Prefer tools with pricing models that scale with business value, not technical adoption. If more usage = more insights, the pricing should reflect that benefit not become a tax.

4. Align Licenses with Real Usage Patterns

Many teams overpay for seats or hosts they don’t use. On the flip side, under-licensing can expose you during an audit. Track usage actively with a software asset management (SAM) approach.

5. Use the Right Tools for the Right Purpose

Not all data tools are observability tools. A common (and costly) mistake is using log aggregation platforms like Splunk for infrastructure monitoring.

Log profiling tools are excellent for search and forensic analysis but not for real-time observability.

When organizations rely on tools like Splunk to monitor infrastructure by piping in logs, they often face two issues:

Skyrocketing costs due to log ingestion volume
Delayed insights, since logs weren’t designed for proactive monitoring or alerting

Instead, use telemetry-native observability platforms with proper support for metrics, traces, and real-time alerting. Let each tool do what it does best.

⦿ Final Thoughts

Licensing might seem like a finance or procurement concern, but it's deeply strategic especially for observability. Your choices now affect how freely your engineers can ask questions of your systems, how many people can explore insights, and how adaptable your stack will be as your org evolves.

Don’t let licensing limit your culture of curiosity.

⦿ Observability Licensing Checklist for Leaders

Use this list before signing any contract:

🔲 Is the pricing model predictable and transparent?
🔲 Are costs tied to adoption metrics (e.g., per user, per query, per host)?
🔲 Has the vendor modeled cost scenarios based on future scale?
🔲 Is usage-based pricing easy to monitor and forecast?
🔲 Does the tool rely on proprietary agents or lock-in mechanisms?
🔲 Are you using OpenTelemetry as your instrumentation standard?
🔲 Are there true-up or overage clauses?
🔲 Can you switch vendors without re-instrumenting apps?
🔲 Are costs aligned with value created, not just technical usage?
🔲 Are license renewals and audits clearly defined?
🔲 Are you using the right tool for the right job?

If your observability tool penalizes growth, it’s not a platform, it’s a bottleneck.

Build vs Buy in Observability: A Pragmatic Look at ROI

Elangovan Sivalingam — Tue, 20 May 2025 22:28:01 GMT

Build vs Buy in Observability: Step 1 — A Pragmatic Look at ROI

Observability ROI

As organizations grow and their systems become increasingly complex, observability is no longer a luxury ,it’s a necessity. But a key decision many engineering and platform teams face is whether to build an observability solution in-house or buy a commercial tool. This decision isn’t just technical; it’s fundamentally about ROI,return on investment.

🧠 The Strategic Decision: Build vs Buy

At first glance, building might seem attractive. It gives you full control, tailored integrations, and the potential for long-term cost savings. But it also demands significant engineering effort, deep domain expertise, and ongoing maintenance. Buying, on the other hand, offers speed, reliability, and vendor support but at a recurring cost.

So how do you decide?

Let’s break it down.

🧩 The Hidden Costs of Building

Building an observability platform from scratch isn’t just about standing up a few open-source tools. You need:

Time: Expect quarters, not weeks, to build a production-ready solution.
Talent: Engineers with deep expertise in observability patterns, telemetry pipelines, distributed tracing, alerting logic, and UI/UX.
Maintenance: Tool upgrades, scaling, integrations, and support overhead.
Opportunity Cost: Every hour spent on platform work is time not spent delivering core product features.

While it may seem “free” to use open-source software, the real costs are often buried in engineering hours and operational complexity.

💳 The Case for Buying

Buying an observability solution like Dynatrace, AppDynamics or Datadog means:

Immediate value: Fast setup and out-of-the-box insights.
Reduced cognitive load: Engineers focus on interpreting signals, not managing pipelines.
Predictable costs: Especially helpful when forecasting budgets.
Product maturity: Commercial tools often come with battle-tested features like RBAC, anomaly detection, SLO tracking, and robust visualizations.

In many cases, buying offers faster time to insight, which is critical when every minute of downtime can cost thousands — or millions.

⚖️ Making the ROI Case

Whether you’re building or buying, the ROI should be front and center:

Are engineers resolving incidents faster?
Is on-call fatigue decreasing?
Are you reducing MTTR and improving customer experience?
Can you support more users with fewer production incidents?

The best observability investments free your teams to focus on what matters: shipping reliable features, not debugging pipelines.

🧭 When Building Makes Sense

That said, there are legitimate cases for building:

You have unique internal needs commercial tools can’t address.
You’re at massive scale where vendor pricing becomes prohibitive.
You have a platform engineering team with capacity and expertise to own the stack.

In such scenarios, the build route may offer a better long-term ROI, if you’re ready to invest heavily.

🧠 Final Thought

Observability is about gaining confidence in the systems you operate. Whether you build or buy, the end goal is the same: actionable insight, not just data.

So before spinning up your next Jaeger or Prometheus cluster, ask yourself:
👉 “Are we solving the right problem, or just building another tool?”