Cloud Adoption (Part II) — Challenges and Preventions

Sunil Rananavare

25 min readApr 7, 2024

In Part I, I briefly covered the survey results highlighting the following key cloud challenges.

· Managing cloud spend

· Security

· Lack of resources/expertise

· Managing software licenses

· Governance & Compliance

· Balancing central cloud team and business unit responsibilities

· Managing multi cloud

· Cloud migration

In this article I would like to share my experience about what usually goes wrong during a cloud adoption program and what can be done to minimize the impact of these challenges.

Over the last two decades, of course, cloud adoption has matured. Cloud migration has already reached the plateau of productivity of the hype curve. There are certainly many organizations who have mastered the art of cloud adoption and are seamlessly able to transition into cloud with little to no friction. So, I just want to make it clear that its not all the ‘gloom and doom’ situation.

That said, managing IT has always been a messy business. If organizations never managed their IT properly before then the addition of cloud to the mix, which is yet another complex technology, just makes the matters much worse.

All the above cloud challenges, fortunately or unfortunately, really stem from the same following old reasons that make any organization inefficient, but with a new cloud twist.

· Accidental (Cloud) strategy

· Impromptu (Cloud Center of Excellence — CCOE) operating model

· Ill-conceived (Cloud) migration plan

· and of course, poor execution of the migration plan

Lets double-click on each of the above reasons.

Accidental Cloud Strategy

If an organization is not accustomed to forming a well-planned business strategy, let alone its IT strategy, then there should be no surprises that such organization’s cloud initiative appears more like an imperative forced upon them rather than a strategically planned one.

Usually, one or more of the following accidental triggers are the root causes of the kneejerk reaction by the leaders to consider cloud.

· Datacenter hardware refresh is due shortly and consolidation is on the cards; someone at the top decides it is a good idea to closedown the datacenter with the hope of cost reduction.

· Recently the organization had a big cyber incident and leadership decides to move into cloud with the hope that security will improve.

· A disaster strikes and datacenter running critical applications goes down and now the leadership is in panic mode trying to get operations back online as soon as possible with the hope that hosting in cloud may provide a quicker path forward for a DR site, preventing this from happening again.

· Business goes ahead and bypasses IT and decides to select a SaaS cloud solution anyway and now IT must come to terms with it and make it happen (Shadow IT).

· There are decades old legacy systems / mainframes still in operation, but no one left in IT knows those systems well and managing these old monoliths has become too risky and costly; to top that now they have a big outage; This incident triggers leadership to finally do away with the legacy system and look for a cloud alternative.

· Existing vendor who manages core critical business systems is persistently enticing and insisting business to adopt their newer SaaS cloud solution; leadership caves in and now IT organization is tasked to make it happen.

· Leadership finally comes around to adopting cloud because of perceived pressure from recent adoption of digitization strategy by the immediate competitor and so they decide it is time for them to follow and hopefully not lose much ground.

· Et cetera.

Cloud adoption is a complex multi-year strategic program, difficult to implement for any organization and requires significant amount of investments in terms of time, money, and commitments from the senior leadership, including the board of directors (BoD) for its entire duration and more.

A program like cloud adoption should never be treated as a new fad or tactical initiative accidently started in isolation within the bowels of some business unit, especially by IT department. It must always be driven by a well-thought-out strategic objective led by CEO of the organization with a clear vision, with realistic strategic goals specifying exactly why the organization is embarking on the cloud journey. The CEO must corral executive team so that they understand its strategic significance, pros and cons, and long-term benefits ensuring buy-in from all key stakeholders.

This is where CEO and executive teams can take help from expert management consultants who can help create justifiable business case with cost and benefit analysis that the executive team can then take forward to their BoD for approval. This also ensures a long-term commitment by the leadership and ensures continual funding for the multi-year program. It also helps set realistic expectations by all stakeholders, internal and external, about the timelines and outcomes.

The leadership team should also consider getting help from expert technology consultants to perform thorough current state assessment of their IT landscape to determine workload dispositions, assess feasibility and estimate high-level cost of cloud migration. This early assessment as a part of cloud strategic planning is very crucial step and contributes to the creation of actionable program roadmap and helps setup proper program governance.

The consultants can also help with analysis and evaluation of strategic cloud objectives of the organization holistically and based on the technical assessments of IT landscape, they can perform feasibility assessment of the workloads and suggest suitable services available from various public and private cloud service providers (CSPs) and thus help narrow down selection of the right CSP(s). This is a very important step early in the planning stages, so organizations can initiate engagement with the chosen CSP(s) and discuss the best course to follow during adoption of their specific services and begin early negotiations with them on contracts, SLAs, discounts, and get awareness of other accelerators, practices, tools, and support CSPs can provide during the migration efforts.

Impromptu CCOE Operating Model

Whenever a new business capability is being established within an organization, one of the weakest links in IT strategy implementation is development and implementation of the right IT operating model for supporting the new capability being established. A very few IT organizations have the right knowledge and ability to complete this important step properly on their own.

Usually, once the strategic decision to adopt cloud is made, CIOs/CTOs are tasked with the responsibility to run cloud adoption program, who invariably delegate this responsibility further down to their direct reports, typically the IT Infrastructure operations heads. Then it is their turn to figure out how to begin with the cloud adoption program. Most IT management leaders spend over 80% of their time on day-to-day IT issues and inflight projects. A very few manage to keep up with strategic planning updates. They neither have the time, necessary skills, or experience to even know where to begin.

Most IT leaders seek advice from CSPs or their certified partners but unfortunately, these folks are great at providing guidance with their version of ‘common’ best practices for their specific cloud technologies but are not the experts in developing fit-for-purpose bespoke IT operating model that meets the organization’s needs.

Some CIOs/CTOs have the presence of mind to consider reaching out to external cloud experts specialized in developing the cloud operating model. However, these experts are never given sufficient time to do proper current state assessments. They are usually asked to come up with the cloud operating model and roadmap within weeks (typically 8–10 weeks). For a large organization this is not sufficient time to perform detailed value stream analysis of their complex business / IT ecosystem. Due to limited time, the consultants are unable to assess full impact and develop comprehensive operating model identifying all required interactions between all parties within business and IT.

With poor strategic planning and unrealistic roadmap in their hands the IT leaders begin the journey and hope to figure things out as they go.

Some IT leaders are prudent enough to admit their limitations and consider outsourcing cloud migration and operations work to any one of the many external cloud managed services providers (MSPs). Unfortunately, again these consultants are usually asked to operate on fixed budget and timelines. So, they spend very little time on setting up the right cloud operating model. The reality is these consulting teams are never sufficiently staffed with the people with right skills and so they fail to realize the cloud operating model properly. This never goes well; all sorts of issues arise daily due to unclear accountabilities, interactions, and division of responsibilities among various internal and external parties. The program is largely destined to get delayed with potential unplanned cost overruns during implementation and thereafter, during operational support.

The first most important responsibility of any CIO/CTO who are tasked to run cloud program is to get help from the right expert consultants and allocate sufficient time to develop the operating model for Cloud Center of Excellence (CCoE), a new cloud organization within IT, to handle everything related to cloud planning, designing, building, delivering, and operationalizing cloud workloads from day one until all the workloads are successfully migrated into the cloud, ready for business to use. The IT organization must consider owning and running CCoE themselves so accountability for cloud migration remains within the organization’s leadership. In case the organizations choose to go with MSPs then at least they must allocate sufficient time for MSPs to understand organization’s cloud objectives and goals so the MSPs are able to develop the right operating model. They should consider conducting fitness assessment of the MSP’s cloud operating model with the help of an external third-party consultant.

The CCoE is established to define best practices, patterns, and guidelines to enable the Cloud adoption journey. The CCoE is a functional team that brings together the Application delivery teams under CIO organizations and other enterprise support organizations such as, Platform Engineering, Enterprise Architecture, Third-Party Risk Management, Human Resources, Change Management, Operations, Finance, and Cybersecurity.

CCoE in turn consists of several subgroups with skilled people dedicated to performing myriad of functions required to operate all aspects of a large cloud migration program. Here are some key functions under CCoE –

· Cloud migration intake and program budgeting and planning

· Application assessment and onboarding to cloud

· Building and managing cloud hosting platforms

· Supporting cloud development, deployment, and operational capabilities

· Conducting cloud related training, change management and hiring new resources

· Cloud architecture design

· Cloud security design

· Vulnerability management

· Cloud observability and monitoring

· Cloud site reliability, availability, and performance

· Cloud development support

· Cloud governance

· Test environments management

· CSP specific services and resources provisioning

· Cloud data and analytics

· Infrastructure as a code (IaC) automation

· Cloud DevOps CI/CD orchestration

· Cloud cost monitoring

· Knowledge base and documentation

Every subgroup must have well-defined service catalog that is agreed with their internal customers and corresponding service delivery processes must be published so the consumers of the services understand the intake steps and all interactions among various CCoE subgroups as well as other enterprise support groups throughout the delivery cycle to avoid any ambiguity about the roles and accountabilities of each group.

The principal function of CCoE is to establish the cloud capability within IT and therefore, a proper governance framework and structure must be established that works seamlessly with the existing business and IT governance frameworks. Decision authorities must be identified and assigned at all levels of IT and CCoE. High level RACI matrix must be created that informs all participating teams about various accountabilities and responsibilities of every stakeholder group participating in the program. Proper policies and principles must be defined to help with the decision making at all levels of cloud operations.

There must be routine cadences established at all levels of governance (from delivery teams, to architecture, to program management, all the way to the top steering leadership) for issue escalations and resolutions always creating path forward to keep the momentum going on all levels of migration activities. Leaders at all levels must prioritize attending these meetings so they stay in the loop on programs progress and provide timely support and guidance. Without the proper cloud governance, the management for the cloud program becomes like herding cats.

One of the most difficult challenges the IT teams face is with adopting prevalent end-to-end solution delivery process that the organization has perfected over the years for their on-premises solution deliveries. Now that process must be customized to suit the cloud solution delivery and deployments. Trust me when I say this, most organizations do not plan this in advance and attempt to fit a square peg into the round hole only to realize that it does not work well.

Regardless of the established SDLC process (waterfall, a form of Agile, or some mix of the two), cloud delivery is a different beast and has its own method to the madness. If the existing delivery processes are not reviewed, customized for CCoE upfront, and socialized beforehand within all stakeholders, then you are guaranteed enormous pain and suffering during every step of the way, from onboarding your applications to cloud, to getting approvals from all the enterprise stakeholders (e.g., architecture, security, regulatory compliance, change advisory board, QA/UAT, operations, etc.). Customizing the SDLC for cloud migration is one of the most important activities that must be completed as a part of developing CCoE operating model.

One important point to note is that the CCoE must be staffed sufficiently to handle the workload demand as during the initial stages of the cloud program there is steep spike of service demand since most delivery teams are not fully prepared for the cloud migration work and so they need handholding. Unfortunately, most large organizations’ CCoE teams use some ticketing system (such as ServiceNow) to manage intake. Managing the queue of open tickets can become a nightmare if CCoE is not sufficiently staffed. It becomes the proverbial “death by thousand tickets”; it does not help either CCoE or CIO delivery teams.

Most of the delays in the program can be attributed to the delays in responding to and approvals of these tickets by CCoE staff and of course, each request has its own SLA period and very soon that can easily add up further to the delays if the teams are short-staffed. It is utmost important that these intakes and interactions steps are explicitly designed to prioritize helping the requesting groups over CCoE’s own interests of easing internal pressures.

As noted, CCoE must be frontloaded with sufficient resource capacity to handle the initial surge, so the designers of its operating model should consider this very important factor and consider adopting ‘flexible resourcing models’ when staffing the teams. For instance, some organizations use flexible temporary contractors/ staff augmentation providers to meet the initial demand until their internal CCoE as well as delivery teams mature and become self-sufficient.

It is understood that all CCoE teams must be equipped with the necessary tools and technologies to perform their routine activities. So, identifying the necessary tools and technologies to be used by the CCoE is an important part of the CCoE operating model design.

And finally, the CCoE teams must define the KPIs to ensure they monitor and measure their service performance routinely and continually to improve and get better.

Ill-conceived Migration Plan

Lets recap the top challenges organizations face during workload migration into the cloud.

The above is just the list of top challenges reported by the survey participants, obviously the actual list of all challenges is incredibly long to cover. So we will touch on these top challenges but also discuss a few related challenges that deserve honorable mention.

Just to be clear, a cloud migration program always means workload migration ‘en masse’, i.e. mass migration. Typically, for a medium to large-sized organizations the workloads to be migrated are numbered in hundreds. The scope of the cloud migration program is never a single large application migration in isolation. Yes, there are instances of single workload migration, albeit a large application, e.g., mainframe replacement, or data warehouse rehosting, but these kinds of initiatives really fall under the category of application modernization. Here the initiative just happens to employ cloud hosting as a solution; it certainly does not qualify as a cloud migration program.

With that in mind, one of the main reasons for the cloud migration planning to go awry is lack of understanding of the objectives and goals of migration program by the planners. Most cloud migration planners (even from some very renowned consulting organizations) follow a boiler-plate approach to performing the migration planning. They begin by taking inventory of all the applications in the affected datacenters. Then they setup several high-level, hour-long interviews with IT application owners to get understanding of the current state architectures, solution / tech stacks, and build pipelines, support models, etc.

In the nutshell, based on this preliminary groundwork and very high-level information they create a perspective on various application patterns, for example, whether it is 2 or 3-tier application, it is a batch processing application, its streaming data / ETL pipeline, its a data warehouse, or a legacy monolith, etc. And they perform high-level analysis based on collected information to create dispositions to determine what kind of migration approach may be required. This is the classic process of assigning 6-R dispositions (Retire, Retain, Replace, Rehost, Re-platform, and Rearchitect) to the workloads. And based on the complexity score, efforts required, and bandwidth / resource availability from delivery teams they group the applications under a few tranches or waves to be prioritized for migration. Thus, a plan is created and handed over to the program manager within CIO organization running the overall migration program to figure out the rest.

This approach is riddled with all sorts of hidden problems that surface only during the execution stages. Lets unpack this obvious-looking and seemingly benign approach a bit more.

One main key objective of moving datacenter workloads en masse is to ensure that throughout all the migration stages as workloads are released in production as per the defined wave plan the end-to-end business functionality must continue to be made available without any (or minimal) disruptions. Essentially, it is the migration of “business functions”, not just individual application workload migration per se. It is not just about migrating individual workload into the cloud; it has everything to do with ensuring that all upstream and downstream applications and data workloads that support end-to-end business processes providing the business functionality are moved together as a ‘unit’ of migration. Well, the reality is most migration plans are not created keeping the above very important viewpoint in mind.

The main reason for the above situation is the consequence of cloud migration programs being driven by IT and not by business. Everything in IT is seen through the “application” lens. The smallest unit of managed entity in IT organizations is the application and data workload. To be more precise, it is the managed “configuration item”, a smallest unit, a building block of a workload such as, software, infrastructure, or middleware component. And so naturally, migration planning undertaken by IT is heavily biased by this mindset. Every activity in the migration plan is prioritized and described keeping the interests of, and impact to IT organizations first.

Also, usually the cloud consultants are hired by CIO/CTOs organizations since IT sponsors the engagement and so consultants have perfected the application/data centric migration plan which makes natural sense to the IT leadership. Nobody sees anything wrong with it and so nobody complains about it and so, the ‘boilerplate’ approach continues to be regurgitated by the consultants from organization to another. Until, of course, when the plan does not work out very well during execution stages. Business functionality is not made available holistically on time for the business to use without disruptions. This is the result of migration plans that do not take into account workload interdependencies. The transitioning workloads that have dependencies on other systems that are not yet released in the cloud get disrupted when those dependent workloads are being migrated. Then everyone realizes the problem, but by then the damage is already done.

The very top challenge highlighted in the survey — ‘understanding of application dependences’ speaks to this exact problem. No surprises there that it is so widespread in almost every migration program.

The right way to discover application dependencies is to perform assessment at the business capability/function level; first by identifying all related workloads supporting the associated business processes, then doing the workload level deep-dive assessment and prioritizing them based on other factors described previously into appropriate tranches or waves for migration. This approach can become feasible if and only if the business leadership sponsors the cloud program instead of CIO/CTOs. In this case business stakeholders will have a lot more say in the overall migration planning matters as they will be directly involved in the decision making ensuring minimal disruptions to operations of their specific business functions.

Each application and data workloads are unique, having their own unique challenges. Some applications are commercial off the shelf (COTS) products, some are homegrown, some are combination of both. By doing detailed technical assessments of individual workload can provide early knowledge about what kind of migration approach is required (the 6-R disposition) and based on that the migration efforts can be estimated more accurately.

When moving the entire set of business functions out of the datacenter into a hybrid/multi cloud the migration planners and cloud architects must identify universally applicable and foundational architecture spikes and create reference architectures that can inform target state architecture designs.

In most organizations these foundational architecture spikes are not identified and reference architectures are not provided upfront. Most cases its an afterthought. In such situation, each delivery team ends up creating their own version of the solution for these foundational elements thus repeatedly reinventing the wheel. This can create confusion among delivery teams as to which approach to use and as a result delaying the deliveries, causing unnecessary duplication of efforts, and consequently, adding to the cost overruns.

Typically, large datacenters besides application and data workloads, also host many enterprise technologies (e.g., logging, monitoring, job schedulers, etc.), identity and access mgmt., integration platforms (ESB, SSIS/SSRS, workflow systems, etc.), and data repositories (such as data warehouses). Planning and prioritization must consider these enterprise systems to be migrated first, as typically application/data workloads have dependencies on one or more of these enterprise systems. Therefore, identifying workload dependencies on these platforms is very important step in the migration planning process. The plan must also provide guidance to delivery teams if their workloads have interdependencies to collaborate and coordinate their testing activities at various stages among each other so they can prevent being the bottleneck for other dependent delivery teams.

The following are some examples of foundational areas cloud architects in CCoE must plan ahead of time and create necessary reference architectures at the early stages of migration planning -

· Cloud subscription/account model design

· Multi cloud landing zones and integration design (with external and on-premises systems)

· Test environment planning and design for all stages of workload delivery process (development, system integration testing, user acceptance, production with disaster recovery site)

· Identifying and developing ‘infrastructure as code’ (IaC) templates, patterns, and modules to be used to build solution stacks using CSP specific tools and technologies (Terraform, ARM templates, AWS Cloud formation, AWS CDK, and other similar technologies).

· Identity and access management for all cloud workloads (including management of service accounts and other privileged accounts)

· Secure vaults, secret stores, and certificate management

· Phased application and data migration/synchronization approaches, tools, and testing strategy. Especially when moving old legacy workloads (e.g., mainframes and mid range platforms) and large sized repositories in phases.

· Data protection (for PII/PHI) design for all data repositories (SQL/noSQL databases, shared file stores, and other specialized data platforms)

· Centralized auditing, tracing, logging (for application, security, data, and infrastructure) and observability

· DevSecOps and CI/CD automation and orchestration pipeline design (continuous everything from integration, build, security scan, test, deploy and production release with the ‘shift left’ mindset)

· Design for patching and upgrade CI/CD pipelines (for VMs and containers)

· Standardizing on the baseline versions (supporting only up to ’n’ and ‘n-1’) for OS, Middleware, Databases, and other enterprise platforms and software components to be migrated into the cloud.

· Cost management and billing approach (for example, use of resource tags in the cloud)

· Considerations of implications of bringing your own licenses (BYOL) for third party applications

· Creating sharable knowledgebase covering various foundational topics and detailed end-to-end delivery process documentation and various playbooks to help delivery teams to operate efficiently.

Poor Execution of The Migration Plan

Regardless of how skilled your delivery teams might be and how experienced your program / project managers / scrum-masters are, failure in the cloud migration planning will guarantee failure of its execution. However, even when the migration plan is great, the execution phase itself can have its own additional challenges when the plan itself is not understood and/or followed properly by the delivery teams.

The first important prerequisite is ensuring all CIO delivery teams are staffed with skilled resources to perform their assigned duties under the migration plan. Each delivery team must have a dedicated project manager/scrum master (depending on the SDLC process) and must always have dedicated tech leads who have technical oversight for delivery team’s activities and a dedicated technical architect who is proficient in creating detailed cloud deployment design based on the target state cloud architecture prepared by the CCoE cloud solutions architect. The tech lead works with project manager/scrum master to help create detailed work breakdown structure plan (WBS)/task backlog for the implementation of the given workload.

Inadequately staffed and ill-prepared delivery teams are bound to take longer to pick up speed. Therefore, it is absolutely necessary that, depending on the complexity of the given workload, the delivery teams are staffed with sufficient number of experienced engineers, and they must be provided with all the necessary training and tools in their respective field of work.

Secondly, lack of sufficient cloud solution architects assigned to the migration program can be a big bottleneck. Therefore, enough cloud solution architects must be available within CCoE to design target state cloud architecture for all planned workloads in the given tranche/wave to keep the momentum of migration work going. Lack of sufficient number of skilled cloud architects on program can slow down the progress significantly.

Creating a good target state cloud solution architecture is both an art and science. The cloud architecture must consider all standard non-functional requirements — security, availability, scalability, observability, performance, maintenance, and cost. Yes, target state cloud architecture must take cost into consideration. Most cloud architects forget to follow the advice from the CSPs (e.g., well-architected frameworks) to make the most out of their cloud subscriptions and pay attention to software licensing costs after migration. Cost overruns is one of the prevalent challenges of the cloud program.

Besides FinOps controls, CCoE must ensure the cloud architects apply cloud cost optimization policies such as listed in the chart below into their designs.

Ideally, the delivery teams must be able to perform all cloud migration activities in a “Factory mode”, where once they prepare the WBS/backlog, they can begin implementation following a scripted process without requiring much assistance from CCoE teams.

Usually, the next challenge for the delivery teams is always to get timely access to various tools and source systems in various environments wherever they are located, in existing datacenters and in the landing zones for the target cloud. These include tools of the DevSecOps CI/CD toolchains and various workload migration tools. One of the reasons for difficulty in getting timely access is because the access is controlled via active directory (AD) groups.

In any large organizations there is a sprawl of AD groups. This not only poses security threats, but it becomes insanely difficult to manage them. Very soon people responsible lose track of what permissions are allowed under which AD group and who has already been added to the group, etc. Once the AD group is created and users are added to the group, very seldom the groups are reviewed and updated/deleted if not active any longer. People in the AD groups are never removed even when they move on to other roles. And in large organizations there can be hundreds of AD groups. Unfortunately, every new member of the delivery team must request to be included in dozen or so different AD groups via a ticketing system, yes, those dreaded ServiceNow tickets again.

The CCoE should put in place AD group governance. They must create dedicated AD groups specifically for cloud migration program for each key role that needs access to various environments and tools within delivery team so with a single ticket appropriate level of role-based access can be granted for person playing certain role (i.e., role-based access control). When people leave migration program then they should be judiciously removed from the group.

Just because the cloud solution architecture has been completed and approved by architecture review board, that does not mean the detailed technical design is ready for consumption by the delivery teams and WBS/backlog can be created from it. Since large cloud migration programs expect automation with DeveOps CI/CD pipelines, infrastructure as a code (IaC) has become a prominent fixture in the cloud development technologies.

Not too many organizations are mature enough in developing infrastructure as a code. This is one of the biggest weaknesses that get exposed during large cloud migration programs. The idea of codifying infrastructure configuration, instead of manually setting up infrastructure resources, has made it very easy to automate deployment tasks.

This is a new territory for most delivery teams, they are familiar with traditional programming to develop application logic, but IaC coding? not so much. In a few progressive organizations with platform engineering teams who deployed on-premises infrastructures using IaC automation technologies (such as Terraform) the situation is not that bad. However, with the surge in DevOps adoption that responsibility is now shifting to delivery teams. This has created a huge vacuum in IaC skills within delivery teams requiring IaC technical architects and IaC engineers specific to given type of cloud.

The CCoE must hire enough IaC architects and engineers to design and build the basic building blocks (modules or patterns) that can then be assembled by the delivery teams into their code to automatically deploy instance of solution stack for their workload in a test environment. These IaC patterns can include additional support to automate configuration of organization’s cloud security and compliance policies. Identifying required patterns and modules upfront prior to mass migration is a huge time-saver, as development of these patterns is a time-consuming activity similarly to the traditional software development. All pattern implementations must be completed by CCoE and published in artifactory catalog prior to detailed technical design can be completed by delivery teams. Failing to do so can create a huge technical debt and eventually delay the migration efforts significantly.

The technical architects in delivery teams must have not only good knowledge of workload solution but also understand IaC patterns and be able to document detailed technical design that can then be used by the IaC developers to build infrastructure as code. They must be able to dig deeper and create detailed technical design with bill of materials, identifying the patterns to be used and other workload specific additional cloud services and compute/networking/storage resources. This technical design document is an important artifact, an essential input for the IaC developers to be able to start coding the automated deployment scripts for the given workload. Most delivery teams do not begin work on this technical design soon enough and end up paying a huge price later in terms of delays in migration efforts.

Most organizations do not consider providing a generic Sandbox environment to their delivery teams ahead of time. Most CCoEs provide access to the development environment after formally onboarding the delivery teams to the landing zone which in turn is only allowed after the target state cloud architecture is approved. This is very problematic, and it is a bit like putting the cart before the horse.

Sandboxes can be huge time savers when it comes to ironing out technical wrinkles while finalizing the detailed target architecture by doing research and by tinkering and prototyping, and building ‘proof of concepts’ pilots. The excuses given for not allowing use of sandboxes typically include either cloud security concerns or cost. But come to think of it, sandbox is the lowest of all environments and provided it is isolated from other environments, there should be very little to worry about from security standpoint. As for who picks up the excess cost of experimentation in sandbox, there should really be a common account set up and monitored for migration program and it should come out of the program funding, and not from individual application workload budget. The benefits of sandbox outweigh many times any of its negatives. Just the time savings alone by allowing shifting left and building early confidence in target state cloud architecture can save a ton of future rework, cost overruns, and yes, grief.

Another major area that CCoE must establish beforehand is creating end-to-end DevSecOps tool chain supporting “continuous everything” — builds, integration, security, compliance, test, deployment, and release activities. As a part of the landing zone deployment the DevSecOp pipelines must already be in pace for delivery teams to use. The workload specific orchestration scripts however are prepared by the delivery teams for all environments in a timely manner else again it can drastically slowdown implementation activities. In most organizations delivery teams are only given access to development environments in cloud and so without the ready CI/CD pipelines and automation scripts, the delivery teams cannot deploy into the higher environments.

Many workloads involve large amount of data. The delivery teams must plan how they wish to perform data migration in advance and test the migration approaches in every stage from lower to upper environments based on guidance provided by CCoE architects. Systems with large data repositories (over tens of terabyte or more) require special approach to data migration which may involve migrating data in stages, and they may need to consider frequent data synchronization between on-premises and cloud instances to maintain currency of data in cloud.

If the organization’s data contains personally identifiable and health data (PII/PHI) then additional care must be taken to protect such data during transit as well as at rest. If the solutions for data protection are not planned, designed, and tested under architecture spikes beforehand by CCoE then doing so during the implementation can drastically slowdown the migration.

Planning test strategy for the migrated workload should also be completed ahead of time by the delivery/QA teams considering interdependencies among other teams due to shared enterprise platforms and repositories. Writing automated test scripts and ensuring they can be invoked via the CI/CD pipelines can also save tremendous time during testing.

The last important activity before commencing the migration activities in earnest is to create a detailed WBS/backlog with full coverage based on the technical design. The tech leads must complete this activity as soon as the technical design becomes available. Most delivery teams do not even have tech leads and expect the project manager/scrum master to play that role, which is insane, unfortunately, I have witness this firsthand and it is quite common.

The code-build-deploy-test cycles can be complicated for large cloud migration program however, it is not too different from the conventional SLDC, so I would not spend time discussing the itsy-bitsy challenges here. By the time the workload is declared ready for production release it must be truly ready. Actual release of the workload into production is a ritualistic process it must go smoothly as per the plan or else the release must be rolled back, which is never a good thing. Therefore, all steps in the release plan must be prepared in advance and approved (preferably via automated workflows) by the change advisory board so the deployment in production can occur without a hitch.

Once the workload is in production and business begins to consume it then it is said to go into the warranty period (typically 60–90 days), the delivery teams (along side CCoE) must continue to provide post migration support for their respective applications during this period. They must keep sufficient resources allocated just for post migration support, failing to do so slows down the ongoing other migration work as the delivery teams that are tied up with post release support cannot be repurposed for the next tranche/wave of workload migrations.

As you can see there are plenty of things to go wrong during the cloud migration process. With proper strategic planning and governance of the cloud migration program and ensuring due diligence in every aspect of planning and execution most of the above challenges can be prevented or at least mitigated with better success.

Author: Sunil Rananavare, IT Strategy Planning and Architecture (CIO Advisory)

If you like the article, then do follow me to stay informed. Share it if you found it useful.

Cloud Adoption (Part II) — Challenges and Preventions

Accidental Cloud Strategy

Impromptu CCOE Operating Model

Ill-conceived Migration Plan

Poor Execution of The Migration Plan

Written by Sunil Rananavare