He that will not apply new remedies must expect new evils; for time is the greatest innovator. -Francis Bacon
Over the last few years, I’ve had the pleasure of working with hundreds of thought-leading executives who are leading large-scale cultural shifts in their organizations. Few have done this better than Capital One, who has built a culture that seems to attract a number of executives that are not just great leaders… they’re also builders and innovators that will shape the future of digital banking. Today I’m lucky to be able to host a guest post from a Capital One’s Terren Peterson, who has taught me quite a bit about what it takes to be successful leading a large-scale cloud migration.
(Note: these best practices, and a number of others, are now available in my book Ahead in the Cloud: Best Practices for Navigating the Future of Enterprise IT).
When Stephen asked me to write about our Cloud journey, I saw it as a privilege given that the move forward with AWS has been a broad team effort where thousands of engineers at Capital One have played different roles.
In reviewing our journey over the past few years, I’m using the Stages of Adoption methodology that Stephen has outlined in earlier blog posts. It’s a great structure to organize a multi-year effort, providing milestones to track progress along the way.
For context, Capital One is one of the nation’s largest banks and offers credit cards, checking and savings accounts, auto loans, rewards, and online banking services for consumers and businesses. In 2016, we were ranked #1 on the InformationWeek Elite 100 list of the country’s most innovative users of business technology.
We are using or experimenting with nearly every AWS service, and are actively sharing our learnings through AWS Re:Invent as well as sharing some of our tooling through open source projects like Cloud Custodian.
Stage 1 — Project
Back in 2013 & 14, we started out our Public Cloud journey with what we called our “Experimentation Phase”, leveraging AWS in our innovation labs to test out the technology and operating model. In this initial stage, we had a limited number of individuals that touched the technology, and minimized the need for education to the broader organization. Those that did participate were highly motivated software engineers, some of which had familiarity with AWS before joining our company.
The Lab was a great place to start given the focus on new application development, and creating small-scale learning environments to prove out new products and servicing tools. Having a small footprint enabled us to test out different security tools, and how different processes and methods from our Private cloud environment externally.
After a successful trial in the Lab, the recommendation was made to continue to use Public Cloud based on the security model, the ability to provision infrastructure on the fly, the elasticity to handle purchasing demands at peak times, its high availability, and the pace of innovation.
Stage 2 — Foundation
Moving into 2015, we added development & test environments to our AWS footprint, and enabled our first production deployments. This was a big step forward in the number of technology associates that needed expertise in the services, which influenced our thinking on how to scale our expertise.
It also initiated a period of investment as we began to use services like Direct Connect to extend our virtual network into AWS datacenters. Effort was required to integrate access management tools to make the environment seamless between our on-premise environment and the AWS US Regions. This reduced friction in our application delivery processes, and assisted in the transition to a Cloud-First infrastructure approach for all new applications.
During this time, we worked closely with multiple groups inside of AWS to establish Cloud engineering patterns. This included Professional Services, Technical Account Managers, Solutions Architects, and AWS Product Teams.
As the demand for the number of Cloud experienced associates expanded, we saw the clear need to build a Cloud Center of Excellence. This team was given the task to capture best practices and learnings from projects within internal teams, as well as build an education curriculum. This included establishing metrics and goals to quantify how many of our associates had been trained, and how many had achieved a level of expertise using the formal AWS certification program.
Stage 3 — Migration
At ReInvent in 2015, we shared publicly our target to leverage our AWS competency to reduce our number of datacenters from eight in 2014, down to three in 2018. This broad objective rallied our organization around how we could use the Cloud to simplify our infrastructure, and drive savings back into the business.
Accomplishing a task this size requires a broad effort, one that continued to leverage the talent being cultivated by our Cloud Center of Excellence. At this point we have trained thousands of engineers in how to use AWS, and our number of AWS Certified Architects and Developers numbers in the hundreds.
As part of application migration, we’ve continued to work with AWS and their partners to assist on establishing processes and patterns for handling migration at scale. We are actively using the migration patterns described by AWS allocating applications into the 6 R’s. This includes “Rehosting” when only minor changes are required, and more of a lift and shift strategy is appropriate vs. “Replatforming” or “Rearchitecting” when more significant investment is needed. Common drivers for this include performing kernel and JVM upgrades with the move, or using more native offerings within the applications like Amzon SQS or Amazon RDS.
Stage 4 — Optimization
As our AWS footprint grows, we continually look for ways to optimize the cost and improve speed by automating reoccurring deployment activities. Some of the optimization efforts are “tuning” the infrastructure that’s allocated for each application. Gradual reduction of EC2 instance sizes where unused capacity is detected, and changing Linux distribution versions can yield major reductions in the compute portion of your bill. This can improve business value for moving to the Cloud, as well as justify other infrastructure advances in automation and tooling.
Other optimization efforts have been bolder, refactoring traditional platforms to use a Serverless model. We currently have several key applications that are currently converting over to this pattern, and we have staffed an agile team to enable our software engineers to use these new services similar to what was done initially with a CCoE. For more insight into the value of serverless, check out some insights here.
Given the robust growth of AWS services, we expect that optimization will be an ongoing effort, requiring engineering resources to validate new services as they are released and map to our application portfolio. It’s also one where we can allocate more resources to once we have closed more datacenters, and moved a greater footprint to AWS.