The Fallacy of Multi- and Hybrid-Cloud
There is no doubt that cloud adoption has been rapidly increasing over the past few years. Almost every company at this point has, or is crafting, their cloud strategy, and most have started moving at least some workloads to the cloud. For larger companies, enterprises with big legacy applications and a long and storied internal IT history, this transition is slow and executed carefully, and typically very cautiously. They have large investments in depreciating hardware, and a sizable staff of employees that are well-versed in managing physical servers and storage, and running virtual machines internally.
Because of these factors, there is often internal resistance to moving too much, too fast into the cloud. The cloud providers realize this, and have started introduce solutions to ease the minds of enterprise customers. To hear some managed service and (smaller) cloud platform companies tell it, the future of cloud computing is almost entirely in hybrid- and multi-cloud solutions. A number of products that have been coming out over the past couple of years tout the ability to run your workload in any cloud and, more specifically, in multiple clouds at once, including so-called “private clouds”. On the surface, this approach seems to have a number of very positive benefits. But is this really the best approach for a company to take for their cloud strategy?
The Benefits
The biggest benefit usually stated for a hybrid/multi-cloud design is reduction, or elimination, of vendor lock-in. This is a fair point. Being able to move between, and run your workloads in, any environment certainly mitigates fears of being beholden to a single cloud provider. If the costs of a vendor of choice unexpectedly rises, or the relationship with that vendor sours, it’s nice to be able to move your entire footprint elsewhere with minimal or no rework.
The second most common reason for this approach is failure mitigation. Outages do happen. While usually these are limited to a single region, and thus a true multi-region design in a single cloud can mitigate these issues, global outages do occasionally occur. Azure has had multiple global DNS and sign-on issues that have disrupted the service, including one last year that lasted 17 hours. Just this week GCP had a major global outage affecting several services at once. AWS experienced a prolonged DDoS attack on Route 53 that caused outages. Spreading your workloads across multiple providers can help to ease the pain when a single vendor has issues.
The Downsides
In order to achieve true multi-cloud design, the capabilities of these clouds have to be reduced to a lowest common denominator — a set of features that is common in both design and execution between them all. In almost every scenario, this means not much more than virtual machines and load balancers. While every provider, by and large, has a similar set of services, how these services are designed, implemented and consumed can be very different. Amazon DynamoDB and GCP Datastore both provide the same basic function—fully-managed, auto-scaling NoSQL databases—but how they function and are accessed are very different.
However, a server is a server, and all vendors can run a virtual machine that you’re familiar with, in largely the same ways. The method of managing and spinning up these virtual servers may be different, but these differences can be abstracted away through various tools, such as HashiCorp’s Terraform. Some solutions, like VMware’s recent cloud partnerships with AWS, Azure and GCP, allow you to extend your VMware environment to compute resources in these other clouds. Once the VM is running, it will operate basically the same way across on-prem and any cloud provider.
Unfortunately, just running virtual machines in the cloud is not necessarily as cheap as it might seem. Even if you are using containers, if you are running these containers in a multi-cloud environment they will be running on basic virtual machines. It can be difficult to make the case that running VMs in a cloud provider has any major cost benefit versus running that same size and number of VMs on-prem in VMware or KVM — in hardware that you’re also depreciating. There are, of course, lots of “soft” savings (data center power costs, management overhead, speed and flexibility, easier DR/HA, etc.), and if you take everything into account it is possible to come out ahead in pricing, but this can be difficult to quantify for the accountants. Reserved Instances in EC2, can save considerable costs over on-demand pricing, but requires a minimum 1-year lock-in, and a 3-year lock-in to get maximum value.
A number of companies have taken the approach of a “lift-and-shift” of their applications into their cloud of choice, doing one-for-one, like-for-like VM migrations. For most newcomers to the cloud, this is the easiest path to get there. However, they often experience sticker shock when the bill starts coming in and it looks to be far more expensive than it was on-prem. This often triggers a backlash against moving anything else into the cloud, and the migration effort is paused until these costs can be justified.
The real “secret sauce” of getting value and benefit out of the cloud is in all the other services they provide. Serverless platforms, database services, container orchestration, machine learning products, object storage — this is where the real power of the cloud comes in, where the major vendors can provide services you can only dream of providing yourself at any kind of scale. And because they all operate differently, for almost all of them it’s not really feasible to abstract these differences away enough to be truly active/active or active/passive multi-cloud, or to seamlessly migrate between them.
Embracing the Lock-In
Ultimately, the best way to get maximum value out of the cloud is to pick one provider and refactor your applications to be “cloud native” and consume that provider’s PaaS, FaaS and SaaS offerings — to really “go deep” into the cloud. I know this statement is probably causing someone’s eye to twitch somewhere, but if you’re not ready to commit to the cloud, you’re not ready to be in the cloud. There was an interesting presentation at AWS re:Invent 2016 on how to think about lock-in and how to plan for it. While the talk was focused on AWS, the principles in it are applicable to any provider.
There are ways to minimize the amount of lock-in, or at least reduce the amount of re-work required to migrate to another solution, to make the formula presented in the video a more favorable calculation. For example, applications should be developed to abstract the database layer away from the application logic, making the database module the only component that would need to be refactored if, for example, you need to stop using DynamoDB and move to another technology in another cloud. While you can never completely eliminate lock-in, some common-sense approaches make it easier to swallow.
Pick-And-Choose Approach
Another potential multi-cloud approach that I have seen be somewhat successful, is to consume services from multiple clouds where it makes sense. Instead of trying to run your application stack across multiple clouds, you may choose to, say, run all your compute in AWS and run your data warehousing and BI in GCP. This “best-in-breed” style design has some benefits — namely, instead of waiting for your preferred provider to catch up on a particular feature, you can go ahead and consume the superior product now — but it is good to be aware of the added costs. These costs are almost exclusively around data transfer.
All cloud providers charge a premium for data transferred out of their cloud. By spreading components of your business across multiple clouds, you are going to be increasing your data transfer costs, sometimes doubling or even tripling them! In the above scenario, transfer fees will be incurred by your applications consuming data from AWS, from the data being replicated from your applications in AWS to the data warehouse in GCP, and then potentially again when clients pull data from the data warehouse to run reports and BI jobs against it. Also consider that you may need to maintain multiple dedicated connections to each cloud provider you use.
You need to weigh all these factors against the benefit you are getting from consuming services in other clouds.
Conclusion
While there can be some benefits to a hybrid or multi-cloud approach, companies should give some serious thought before adopting this design. The real power of the cloud is not virtual machines, but in all the services that make up a cloud provider’s portfolio. There are real synergies that arise as you start to tie all these services together. So, if it’s a fear of lock-in that is driving your decision, make sure that fear isn’t costing you considerably more than accepting that there will be lock-in, preparing for it, and taking full advantage of the cloud.