Cloud transformation journey of a financial enterprise

Published in

Sparque labs

6 min readJun 11, 2021

Introduction

The fintech ecosystem is being disrupted massively by a host of nimble tech-savvy startups that introduce new financial products quickly and innovatively, and in an attractive fashion with appealing User Interfaces to young, millennial customers that love to use mobile devices.

These nimble companies can iterate through their products and add features quickly because they use new platforms, stacks and end to end processes that allows their developers to enjoy development of new features with the least amount of friction, deploy them quickly with sophisticated release/deployment strategies for quickly iterating through their modifications, features and figure out what best works for their products.

Bigger Financial enterprises have everything they need and more in order to do the same and compete in the same space, BUT are bogged down by bureaucracies, manual processes, legacy platforms and runtimes, legacy tech stacks, legacy build, deploy, release processes.

To compete efficiently with the disruptors, you have to disrupt yourself. The following tells a story of how this was accomplished from an end to end perspective.

The following are common problems with big legacy enterprises:

10-plus business verticals, > 100 legacy applications (mostly Java monolithic servers), 1000+ developers, 10s of 1000s of physical systems in a Datanceter or VMs
Legacy API gateways for all APIs with complicated application business flows involving apps from different verticals talking to each other to fulfill the financial flows
Stakeholders from Architecture, Engineering, Product, Operations and Infrastructure, Release management, QA, Integration, other groups
Adhoc Environment provisioning — Dev, QA, certification, Upper environments, manual provisioning with long processes — takes between days to weeks to be provisioned and still are error-prone requiring multiple hands-on fixes
About 70–80% of time of QA and 50% of dev time during Integrations, release cycles, goes towards adhoc deployments, fixing environments prior to integration/test
Constant issues and downtime in production due to configuration mismatches, manual modifications of configs, keys, unknown state of apps
Massive amount of manpower and expense in keeping the release train/software delivery working with ad-hoc patches and fixes
Time taken away from activities that generate revenue and solve business problems
Monolithic Apps that took a lot of resources and difficult to scale/slow

High-level Approach to disrupting a Legacy enterprise

The following describes some ways this can be approached:

Strategy:

Lift and Shift” (for legacy) and
New (for “greenfield” applications as cloud-native micro-services)

Common approach:

Containerization of legacy applications
Migration of legacy Applications to Kubernetes cloud taking current security into account
Hybrid deployments of legacy applications to Kubernetes in production with traffic-shaping to transition
CI/CD automation, PAAS APIs/UIs and frameworks and tools built
Self-service APIs/UIs for integrations, developers, release management, operations/infrastructure
PAAS (Platform-as-a-service) offered easy on-boarding of new apps, self-service for developers, fast deployment across multiple environments, self-service QA/deployments, deployment strategies like Blue/Green, Canary Deployments at the Vertical level
Use of Common frameworks eliminates the need to change code or mechanisms for logging and monitoring

Setting up a longer term roadmap

Goal: Migrate to a cloud platform — private or public — on a common infrastructure like Kubernetes so that the deployments are portable across clouds
Goal: Using the Service Mesh for service to service (micro) communication, security, tracing, observability and easy connectivity of applications
Goal: Possibly using Service mesh technologies to deploy multiple versions of the same application in production and drive progressive and continuous delivery of applications without downtime

Results of moving to Kubernetes

Orders of Magnitude Improvements to speed of development, releases, Self-service, environment provisioning, releases, developer experience, integrations, release management, production deployments

Migration to Cloud-Native Microservices

This would need a simplified, grassroots approach for converting legacy applications to micro-services

Secret Sauce

The following offers a recipe for achieving the needed disruption:

Start small, be flexible, work with 1 team and 1 or 2 types of apps, develop best practices/standardization, experiment on best approach
Clearly defined approach — when “lift and shift” vs when “cloud-native microservices”
Automation — Strong DevOps/Automation team with good rapport with devs, ops, infra, release management
Standardization via Automation: standardizing how things are built, configured, deployed, accessed makes things easier
CI/CD — Build fully automated, fast CI/CD pipelines to package and deploy applications easily and as frequently as necessary
KEY: Working with Operations/Infra to ensure same mechanisms for logging and monitoring continued to work
Self-service PAAS API/UI — Ease of use by building “Platform as a service” capabilities for Kubernetes with very easy to use UIs and APIs
DevEx: Make it so easy to get code from dev laptop to source to a working environment asap that they cannot work without it
Onboarding ease: Onboard new applications easily using a UI
Hide complexity: Make it so that developers do not have to understand complexity of Docker, container networking, kubernetes etc.
Hybrid deployments of existing components “side-by-side” with transition plan involving all parties
Microservice App starter packs for quickly firing App Skeletons and automatic on-boarding to CI/CD

Overall Approach/Methodology

The following describes the methodology/approach that needs to be taken for achieving the disruption.

Discovery — talk about vertical, apps, users, product
Technology Assessments:
Architecture, Workloads, Systems, Users, Usage details, Runtimes and Operating systems used, resources, networking, deployments, operational requirements, prioritization, Critical vs non-critical applications
Capture current state and evolvable future state
Execution of migration to containers
Containerize applications, standardize and automate everything including configurations, deployments
Discuss People, Orgs, Roles and Process changes
Functional validation
Work with QA to perform functional validation on both legacy and containers
Focused knowledge sharing
Operational Transfer
Ongoing optimizations
Self-service APIs/UIs, Integrations for Automation
Release Train:
Move Dev and QA primarily to containers
Hybrid/Simultaneous Release Train where you can release on both Legacy platforms as well as on containers simultaneously
Created Hybrid environments with Networking, Operational, Infrastructure help
With operational help, use gradual traffic-shaping to shift traffic from Legacy Infrastructure to Kubernetes, roll back if needed
Use mechanisms to enable Blue/Green, Canary deployments

Transition diagrams

Typical Enterprise App deployments

Transitioning to Kubernetes using Hybrid deployments

Migrate 1 app with hybrid deployments using traffic-shaping capabilities with a Load-balancer like Nginx or F5
Migrate a vertical’s with hybrid deployments using traffic-shaping capabilities with a Load-balancer like Nginx or F5
Migrate more

Platform as a service capabilities

The following shows what capabilities might need to be offered by Platform as a service to make life easier for developers, release management, operations and infrastructure to manage the entire lifecycle of an app from start to finish.

Cloud transformation journey of a financial enterprise

High-level Approach to disrupting a Legacy enterprise

Secret Sauce

Overall Approach/Methodology

Transition diagrams

Platform as a service capabilities

Published in Sparque labs

Written by Sanjay M

No responses yet