Sitemap

AI as My DevOps Partner: This Changes Everything About Cloud Deployment

6 min readAug 12, 2025
Press enter or click to view image in full size

The car finance industry has long been burdened by legacy systems. Our journey began with a clear challenge: our existing loan origin process was hampered by architectural limitations, manual touchpoints, and fragmented data silos, leading to slow processing and a poor customer experience. The mission was ambitious but necessary: to completely transform this landscape by replacing the old, manual processes with an AI-powered platform capable of delivering fast, seamless, and intelligent financing decisions.

The solution was a new, event-driven microservices platform built on the cloud. This wasn’t just about new infrastructure; it was about creating an “innovation lab” to enable the rapid, AI-powered development we envisioned. A key requirement was integrating this modern platform with the existing Dealers Touch Point (DTP) system via a SOAP API to receive finance applications, a challenge that made the scalability and flexibility of the cloud essential.

In an unprecedented four-week sprint, our three-person team used an AI-accelerated development process (An Internal research approach from BFSI group with in thoughtworks) to build this entire platform — five production-ready microservices and over 50,000 lines of code. With the application fully coded, my challenge was to take this brand-new system and bring it to life on AWS with limited cloud & platform building expertise. This isn’t the story of how the application was coded; it’s the story of how I partnered with an AI co-pilot to deploy it, and the crucial lessons I learned bringing it live in just 3 days.

The Architecture: A Complex, Real-World System

First, let’s look at the components I had to deploy. This was far from a simple sample application:

  • Five MicroServices: Each with unique, real-world requirements. One service required Amazon S3 for document storage, another integrated with a third-party credit bureau, and a third used an external OCR function for document verification, which meant handling multiple sets of external API credentials.
  • An Event-Driven Core: The services were designed to communicate asynchronously using Apache Kafka.
  • A React Frontend: Hosted on S3 and served globally via CloudFront.
  • Containerization: The entire system was designed to run on ECS Fargate for serverless container orchestration, with images stored in ECR.

A key challenge was our reliance on external, third-party systems for critical functions like credit bureau checks and OCR processing. To enable development without being dependent on these external APIs, we implemented a clever strategy: we built mock Lambda functions, fronted by an API Gateway. These mocks perfectly mimicked the behavior and responses of the real third-party systems, allowing the team to build and test our services in a fast, isolated, and cost-effective environment.

Press enter or click to view image in full size

AI as My Cloud Architect: Generating the Terraform Blueprint

My deployment strategy was to split the setup into two distinct parts: the foundational infrastructure (VPC, ALB, Kafka) and the microservices themselves. This approach would allow us to create a stable, reusable platform before deploying any application code.

To generate the Terraform code, I worked within my editor using the Cline VS Code plugin, which allowed me to efficiently send detailed prompts to my LLM of choice, Claude. First, I tasked the AI with scripting the entire architecture details and foundational infrastructure things that are required.

Once that core platform was live, I focused on deploying just a single microservice. I again used the Cline and Claude combination to generate the Terraform configuration for its specific needs — an RDS database, ECR repository, and ECS Fargate service definition. Before applying anything, I meticulously reviewed the terraform plan output. This human-in-the-loop step was critical; I would repeat the plan and review cycle until I was confident the changes were correct and cost-effective. This first microservice became our template.

The Research Analyst: Validating Solutions with Gemini

To complement Claude’s code generation, I used Gemini as a dedicated research analyst to validate key architectural choices. This allowed me to rapidly confirm that S3 with CloudFront was our most cost-effective UI hosting solution and that using Application Load Balancer listener rules was the standard, most secure pattern for our service-to-service communication needs. Gemini provided the data-driven confidence to move forward quickly on these critical decisions.

Key Lessons from the Trenches

Lesson #1: AI is a brilliant generator but needs a human cost accountant.

The initial AI-generated code was “enterprise-grade” by default: large Fargate tasks and Multi-AZ RDS instances. While robust, this was not cost-effective. My first and most critical job was to act as the senior architect, meticulously reviewing every terraform plan to right-size the infrastructure and slash unnecessary costs. This human oversight is crucial for managing cloud spend.

Lesson #2: Agile infrastructure is a superpower for evolving requirements.

After deploying the first two services, a new requirement emerged: Service A needed to make a direct API call to Service B. Instead of a complex service discovery implementation, I simply updated our Terraform code to add new Application Load Balancer listener rules. By using path-based routing, I enabled secure service-to-service communication in minutes. This proved the incredible flexibility of an IaC setup to adapt on the fly.

Lesson #3: AI can’t predict every operational pitfall.

The system was live and stable, but a few days later, a familiar cloud horror story began: the exploding bill. My CloudWatch costs were skyrocketing. This real-world “gotcha” was a powerful reminder that AI doesn’t yet have operational experience. The culprits were infinite Kafka retries, verbose logging, and cluster-level Container Insights — all of which required a human expert to diagnose and fix.

Press enter or click to view image in full size

From Code to Cloud: The AI-Generated README

One of the most mind-blowing parts of this process was that the LLM didn’t just write the Terraform code. I prompted it: “Create a step-by-step README for a developer to deploy this service.”.

It produced a perfect markdown file with the exact commands for our workflow, which used our Cline VS Code plugin — enabled with the AWS Terraform MCP server — to apply configurations. It also included instructions for building a Docker image and pushing it to ECR. This AI-generated documentation became our official playbook.

The confidence this AI-assisted workflow gave me was immense. I could tear down the entire stack of services and infrastructure with terraform destroy and bring it all back online flawlessly minutes later. This wasn’t just a deployment; it was a truly ephemeral, repeatable, and resilient environment, built and documented with the help of AI in hours, not weeks.

My Final Takeaway: AI + Expert = Unprecedented Velocity

This experience was a profound look into the future of cloud engineering. AI didn’t replace me. It augmented me. It took on different roles — a code generator, a research analyst, a documentation writer — which allowed me to focus on the highest-value tasks: architecture, optimization, security, and cost control.

By pairing my expertise with AI’s speed, I was able to deploy a sophisticated cloud platform with complex dependencies at a pace that would have been pure science fiction just a few years ago. The future isn’t about developers being replaced by AI; it’s about developers who know how to wield AI becoming the most valuable players in the industry.

--

--

No responses yet