Cloud transformation journey of a financial enterprise
Introduction
The fintech ecosystem is being disrupted massively by a host of nimble tech-savvy startups that introduce new financial products quickly and innovatively, and in an attractive fashion with appealing User Interfaces to young, millennial customers that love to use mobile devices.
These nimble companies can iterate through their products and add features quickly because they use new platforms, stacks and end to end processes that allows their developers to enjoy development of new features with the least amount of friction, deploy them quickly with sophisticated release/deployment strategies for quickly iterating through their modifications, features and figure out what best works for their products.
Bigger Financial enterprises have everything they need and more in order to do the same and compete in the same space, BUT are bogged down by bureaucracies, manual processes, legacy platforms and runtimes, legacy tech stacks, legacy build, deploy, release processes.
To compete efficiently with the disruptors, you have to disrupt yourself. The following tells a story of how this was accomplished from an end to end perspective.
The following are common problems with big legacy enterprises:
- 10-plus business verticals, > 100 legacy applications (mostly Java monolithic servers), 1000+ developers, 10s of 1000s of physical systems in a Datanceter or VMs
- Legacy API gateways for all APIs with complicated application business flows involving apps from different verticals talking to each other to fulfill the financial flows
- Stakeholders from Architecture, Engineering, Product, Operations and Infrastructure, Release management, QA, Integration, other groups
- Adhoc Environment provisioning — Dev, QA, certification, Upper environments, manual provisioning with long processes — takes between days to weeks to be provisioned and still are error-prone requiring multiple hands-on fixes
- About 70–80% of time of QA and 50% of dev time during Integrations, release cycles, goes towards adhoc deployments, fixing environments prior to integration/test
- Constant issues and downtime in production due to configuration mismatches, manual modifications of configs, keys, unknown state of apps
- Massive amount of manpower and expense in keeping the release train/software delivery working with ad-hoc patches and fixes
- Time taken away from activities that generate revenue and solve business problems
- Monolithic Apps that took a lot of resources and difficult to scale/slow
High-level Approach to disrupting a Legacy enterprise
The following describes some ways this can be approached:
Strategy:
- Lift and Shift” (for legacy) and
- New (for “greenfield” applications as cloud-native micro-services)
Common approach:
- Containerization of legacy applications
- Migration of legacy Applications to Kubernetes cloud taking current security into account
- Hybrid deployments of legacy applications to Kubernetes in production with traffic-shaping to transition
- CI/CD automation, PAAS APIs/UIs and frameworks and tools built
- Self-service APIs/UIs for integrations, developers, release management, operations/infrastructure
- PAAS (Platform-as-a-service) offered easy on-boarding of new apps, self-service for developers, fast deployment across multiple environments, self-service QA/deployments, deployment strategies like Blue/Green, Canary Deployments at the Vertical level
- Use of Common frameworks eliminates the need to change code or mechanisms for logging and monitoring
Setting up a longer term roadmap
- Goal: Migrate to a cloud platform — private or public — on a common infrastructure like Kubernetes so that the deployments are portable across clouds
- Goal: Using the Service Mesh for service to service (micro) communication, security, tracing, observability and easy connectivity of applications
- Goal: Possibly using Service mesh technologies to deploy multiple versions of the same application in production and drive progressive and continuous delivery of applications without downtime
Results of moving to Kubernetes
- Orders of Magnitude Improvements to speed of development, releases, Self-service, environment provisioning, releases, developer experience, integrations, release management, production deployments
Migration to Cloud-Native Microservices
- This would need a simplified, grassroots approach for converting legacy applications to micro-services
Secret Sauce
The following offers a recipe for achieving the needed disruption:
- Start small, be flexible, work with 1 team and 1 or 2 types of apps, develop best practices/standardization, experiment on best approach
- Clearly defined approach — when “lift and shift” vs when “cloud-native microservices”
- Automation — Strong DevOps/Automation team with good rapport with devs, ops, infra, release management
- Standardization via Automation: standardizing how things are built, configured, deployed, accessed makes things easier
- CI/CD — Build fully automated, fast CI/CD pipelines to package and deploy applications easily and as frequently as necessary
- KEY: Working with Operations/Infra to ensure same mechanisms for logging and monitoring continued to work
- Self-service PAAS API/UI — Ease of use by building “Platform as a service” capabilities for Kubernetes with very easy to use UIs and APIs
- DevEx: Make it so easy to get code from dev laptop to source to a working environment asap that they cannot work without it
- Onboarding ease: Onboard new applications easily using a UI
- Hide complexity: Make it so that developers do not have to understand complexity of Docker, container networking, kubernetes etc.
- Hybrid deployments of existing components “side-by-side” with transition plan involving all parties
- Microservice App starter packs for quickly firing App Skeletons and automatic on-boarding to CI/CD
Overall Approach/Methodology
The following describes the methodology/approach that needs to be taken for achieving the disruption.
- Discovery — talk about vertical, apps, users, product
- Technology Assessments:
- Architecture, Workloads, Systems, Users, Usage details, Runtimes and Operating systems used, resources, networking, deployments, operational requirements, prioritization, Critical vs non-critical applications
- Capture current state and evolvable future state
- Execution of migration to containers
- Containerize applications, standardize and automate everything including configurations, deployments
- Discuss People, Orgs, Roles and Process changes
- Functional validation
- Work with QA to perform functional validation on both legacy and containers
- Focused knowledge sharing
- Operational Transfer
- Ongoing optimizations
- Self-service APIs/UIs, Integrations for Automation
- Release Train:
- Move Dev and QA primarily to containers
- Hybrid/Simultaneous Release Train where you can release on both Legacy platforms as well as on containers simultaneously
- Created Hybrid environments with Networking, Operational, Infrastructure help
- With operational help, use gradual traffic-shaping to shift traffic from Legacy Infrastructure to Kubernetes, roll back if needed
- Use mechanisms to enable Blue/Green, Canary deployments
Transition diagrams
Typical Enterprise App deployments
Transitioning to Kubernetes using Hybrid deployments
- Migrate 1 app with hybrid deployments using traffic-shaping capabilities with a Load-balancer like Nginx or F5
- Migrate a vertical’s with hybrid deployments using traffic-shaping capabilities with a Load-balancer like Nginx or F5
- Migrate more
Platform as a service capabilities
The following shows what capabilities might need to be offered by Platform as a service to make life easier for developers, release management, operations and infrastructure to manage the entire lifecycle of an app from start to finish.