Using GitOps to Manage the Lifecycle of CloudResources with Argo CD

ravi hari
keikoproj
Published in
4 min readJun 26, 2021

Written by Ravi Hari and Todd Ekenstam

At Intuit, we’ve been building our Kubernetes Platform for several years now. When we started our Kubernetes journey, we used Kops to create clusters in AWS. We soon realized that we were spending a lot of time manually creating and maintaining AWS cloud resources such as VPCs, S3-Buckets, NACLs, SecurityGroups, VPC Endpoints, VPC Peerings, etc. More recently, when we started creating clusters with Amazon EKS, we decided to deploy and manage AWS resources with Gitops using Argo CD. As a result, we can now move faster, with better Compliance, Security, and Auditability for AWS cloud resources.

Since Argo CD only handles resources that are represented in Kubernetes, we decided to create a Cloud Resource Controller (cloudresource-Manager) that represents CloudFormation stacks inside Kubernetes as Custom Resources.

Design

AWS resources are declaratively specified in a Git repo and changes are submitted as PRs. Once a PR is reviewed, approved, and merged, Argo CD invokes Manny, a Cli tool that translates the declarative spec to a set of Custom Resources that cloudresource-manager controller can consume.

Since the CloudFormation templates for AWS Resources can be verbose, we specify the AWS-Resources in terms of reusable templates, which Manny can download from a remote repository or S3 bucket, and translate the specs in the git repo to Custom Resources that cloudresource-manager controller can consume.

Cloud resources represented as K8s Custom Resources in Argo CD-UI

Argo CD will display the Custom Resource in the desired manifest tab of Argo CD UI.

Desired Manifest in Argo CD-UI

Once the user verifies that changes are good the user would sync the change which will submit the manifest to kubernetes cluster and the controller running in the cluster will pick it up and start to reconcile.

Synchronise the cloud resource

A cross-account role attached to the controller pods will have a policy that can assume a role in target accounts that executes CloudFormation stacks. See this doc on how IAM roles are used.

The controller having a CloudFormation session will then run the create changeset operation. It will then poll the changeset events to see if it has reached a CREATE_COMPLETE terminal state. It will proceed further to execute the changeset to create the AWS cloud resource. The Custom Resource is updated with the corresponding events, messages and the status as the CR gets executed.

AWS CloudFormation Stack Events

Once the stack is created successfully, the user can update the cloud resources later by raising a PR with the intended changes. Once the PR is reviewed, approved and merged, the changes can be seen in the Argo CD UI as a Diff between the existing resource and the proposed change.

An update is desired on the cloud resource
Diff view to see what exactly is changed in Argo CD-UI

The change can be synced, and the operations to createChangeset and executeChangeset are repeated in the controller.

Synchronize the update

The updated s3-bucket is created, and the older one gets deleted.

s3-bucket stack gets updated and desired s3-bucket is created

Finally, all the stacks are synced successfully in the Argo CD.

All resources are in synced state successfully.

With this work, we were able to create AWS cloud resources very fast across a fleet of clusters. This saved us ~15k person-hours when we created ~200 EKS clusters. This also helped us in managing 10k cloud resources seamlessly with security and auditability across the AWS accounts used for EKS clusters.

Thanks to the Team: Edward Lee, Mukulika Kapas, Todd Ekenstam, Laks S, Manav Wadhwa, Pam Fong, Gangadhar Rayudu, Matt Ouille, Sai Vishwas Padigi, Aakash Chandrasekharan.

About the Authors

Ravi Hari is a Principal Software Engineer at Intuit working in Intuit’s Kubernetes platform. He worked on Infrastructure for Kubernetes at Intuit where he developed frameworks and tools for Kubernetes Cluster Creation on AWS and Gitops for AWS that manages Cloud Resource Lifecycle. He is also the maintainer and core contributor to open source project keikoproj/active-monitor which helps in Monitoring and Self-healing of Kubernetes clusters.

Todd Ekenstam is a Principal Engineer at Intuit building a platform for secure, multi-tenant Kubernetes infrastructure supporting applications serving Intuit’s ~50 million customers. Todd has worked on various large-scale distributed systems projects during his 25+ year career, ranging from hierarchical storage management, peer-to-peer database replication, enterprise storage virtualization, two-factor authentication SaaS, and most recently, Kubernetes clusters. He has presented at a variety of academic, government, and industry conferences. Todd is co-author of the book GitOps and Kubernetes: Continuous Deployment with Argo CD, Jenkins X, and Flux.

--

--