Comparing the Top Eight Managed Kubernetes Providers

My experience deploying Helm charts on eight cloud providers.

Elliot Graebert
13 min readFeb 1, 2023
Dall-E 2

I recently wrote a blog post about how the future of software development is likely to involve remote development. Remote development is when you shift the compute-heavy tasks on your laptop (compiling, testing, and local process execution) to the cloud. The post sparked a lot of debates, but one, in particular, was a complaint that my solution only worked on Google Cloud. Fair point, I took that as a challenge.

Can I use Terraform to deploy a couple of helm charts on every major cloud?

GCloud GKE: easy
Azure AKS: easy
AWS EKS: awful

Frustrated with AWS, I went down a rabbit hole of finding more managed k8s alternatives: DigitalOcean, IBMCloud, OVHCloud, Scaleway, Oracle Cloud, Rackspace, Linode, etc. In the discovery phase, I googled several articles looking for the top Kubernetes providers. These articles listed providers like Rackspace and Oracle Cloud as contenders with Azure and Google. After getting personal experience, I can say:

This lack of authenticity compelled me to write this blog post. Unlike these other blog posts, I wanted to provide you (the reader) with strong opinions instead of a watered-down, feel-good article. I also only spent 2–4 days per provider, so I’m very open to changing my opinion or hearing that my math was wrong. I’d rather have a hearty debate in the comments than an article that says that DigitalOcean and IBMCloud are comparable.

For each platform, I will be going over:

  • Platform experience — creating an account, reading the documentation, and generally navigating the cloud platform.
  • Kubernetes experience — how difficult was it to get the two Helm charts fully deployed, and how feature-rich was the offering
  • Cost — mostly from a small business perspective.

I do not cover security, scalability, or availability, as these topics are required either too much time or money to properly vet out.

If you like this post, please 👏 , subscribe, follow me, or connect on LinkedIn to let me know I’m on the right track!

· Scope
· Summary of results
· The Signup and Console Experience
· Deploying the Helm Charts
· Cost Analysis

What was the scope of this experiment?

The deployment consists of a Helm chart for PostgreSQL and a Helm chart for Coder. The PostgreSQL chart requires a persistent volume, and the Coder chart requires a load balancer for ingress. The Coder app will spin up additional pods on the cluster as part of the developer workstation feature, so a dedicated k8s namespace is a good idea. This only scratches the surface of k8s, but it’s a more practical choice than doing some type of “hello-world” pod.

I wanted to do this entirely from Terraform simply because it sounded fun (OVHCloud made me regret that). I challenged myself to create a single TF file per provider that bootstraps the platform, creates a k8s cluster, and deploys both Helm charts. I also tried hard to minimize the code, using as many defaults as possible from the provider. The results are all in github.com/elliotg/coder-oss-tf.

I want to give a shoutout to Spacelift. I used Spacelift’s free offering for all of my work which was an excellent experience. They had some great features around file mounting that helped me get unblocked with OVHCloud. If you are looking for a Terraform executor, you should give them a look. In addition, I stumbled across k8slens.io, which was vastly superior to any other dashboard I’ve seen.

Enough already, what is the high-level summary

Best overall: Azure
Best for startups: Linode

If you are a startup that isn’t messing with hardcore data, I can see a rationale for Linode, Scaleway, or DigitalOcean.Due to their complexity, I struggle to recommend AWS, Azure, and GCloud. These other providers make it so easy to go back to writing code.

Azure is the best at Kubernetes. There are great tools in GCloud and AWS, and maybe you need to build your business on top of them (like AWS S3 or Google’s BigQuery). However, if speed to deploy and respond to user traffic is the most significant decision, then AKS is the winner (specifically with k8s). They had the best dashboard experience and the least number of hiccups. The biggest downside of AKS is Azure, as I find the rest of the Azure Cloud experience to be sub-par (to put it nicely).

The Signup and Console Experience from the 8 Cloud Providers

My feedback is focused on the signup process and the general experience of navigating their console. You’d think the signup process wouldn’t matter, but I would have had 10 cloud providers (instead of 8) if the signup process was easy.

Signup isn’t easy (for some clouds)

This is more of a pass/fail category. Either the hosting provider made it easy for me to give them my money, or they made it difficult. There’s no point in talking about the hosting providers with easy signups, so let’s jump straight to the providers whose signup process was so bad that I never got to see their product.

  • Rackspace — To get access to an account, you need to contact their customer support, schedule a phone call, and pitch them on your project. They won’t work with you if they don’t find your project interesting. This was a huge red warning sign, and I do not recommend them.
  • Oracle Cloud — For each other cloud provider, I used my personal credit card and spent actual money for this blog post. Even the Europe-based clouds didn’t reject me. Oracle Cloud rejected all three of my cards. I filed a customer support ticket, and they refused to help: “Unfortunately, we are unable to resolve this or process the transaction. This is all the information we can provide.” So that was the end of my experience with Oracle Cloud. B-, 0/5 stars, would not recommend.

While I successfully deployed on OVHCloud and IBMCloud, they still had hiccups that the other cloud providers did not have.

  • IBMCloud — I had the same issue as Oracle Cloud, where my billing was initially rejected. Unlike Oracle, IBM’s customer support team was actually helpful and fixed the issue on their side.
  • OVHCloud — While I was able to get past the initial stage, I will say that OVHCloud’s signup process is also overly difficult. They require photo evidence of your legal ID. It’s through a very janky upload portal that I feel confident means my identity has been stolen. On top of that, they charge a flat $30 / month fee just to have access to their platform.

General Platform Experience

Let’s start with the winners: Digital Ocean, Linode, and Scaleway.

The interfaces look fairly identical, making it impossible to declare one as better than the others. Their design is crisp and clean, without the feature bloat prevalent in AWS and Azure. In my opinion, this simplicity goes a long way to helping you get in, deploy your app, and move back to writing code.

Bottom Line: if you are a startup with less than 3 DevOps engineers, you should stick with one of these simple clouds. They just work.

After using the three major providers back-to-back, I can’t say anything good about these three.

  • AWS Console — I remember first using the console 10 years ago, and I felt it was reasonably straightforward. Since then, AWS has become a bloated monstrosity. I can only imagine the trauma of experiencing the AWS Console for the first time.
  • GCloud Console — Many of their pages look like a quickly rendered wireframe without any thought toward presentation. While finding the services I cared about was relatively straightforward, the information on these pages was poorly visualized. Put Linode and GCloud side by side, and tell me which one you prefer.
  • Azure Console — No one in this industry was surprised when Microsoft created this user experience nightmare. For example, when deploying k8s, I had to pick from over 1000 instance types with helpful names like “Standard_N48plds_v5” where the “p” stands for AMD. Crystal clear; thanks Microsoft.

My Experience Deploying Helm Charts Across All Eight Providers

For my k8s experiment, I aimed to deploy the cluster, nodes, and pods through Terraform. The pods would consist of a Coder Helm chart and a Bitnami PostgreSQL Helm chart. In 2023, I expected this project to be straightforward, well-documented, and executed without hiccups. Disappointment became a close companion as I struggled with some hosting providers. Though I had my doubts, I succeeded, and you can check out the code here.

I’m going to break the experience down into the following three parts:

  • Provisioning speed — The speed at which the platform can spin up a fresh Kubernetes cluster, add nodes to its pool, deploy pods, and provision storage and ingress load balancers.
  • Console experience — The experience of navigating the console from the perspective of a non-expert.
  • Sane defaults — This is a minor point, but I very much wanted to call out where cloud providers (like AWS) dropped the ball in terms of picking sane defaults.

K8s Provisioning Experience

Overall: Azure was the fastest for my workload with Linode as a close second

Nodes were each a 2x vCPU w/ 16 GB memory. The first measurement was how long it takes to deploy the k8s cluster, including at least one node in the node pool. The second measurement is the time it takes to add a second worker node (e.g. scaling up). The final metric is the time to deploy the two Helm charts serially. The PostgreSQL chart needs a persistent volume claim, and the Coder chart needs an ingress load balancer. I ran the test 3 times per hosting provider. The table below shows the averages:

In terms of speed to get your app deployed, Azure is the winner. Google is fast at adding new nodes to your cluster, but what’s the point of a node if it doesn’t have workloads running on it? If you add the new node time and the Helm chart time together, Azure is about 20% faster than Google and 35% faster than AWS.

Linode’s incredible speed for booting up new k8s clusters will appeal to some audiences, and their overall node deployment time was solid.

I didn’t spend a significant amount of time root causing why the helm deploys took so long for some providers. There are a multitude of factors, but I felt like this was information overload. This is a good reminder that if speed is paramount, nothing beats testing your actual product. Depending on the cause of the slowdown, you might be able to optimize the underlying image.

K8s Console Experience

Overall: Azure wins with having the best UI experience

A standard Kubernetes stack is complicated, and a good dashboard goes a long way. Each hosting provider needed to make a decision on what type of experience to provide their users.

GCloud GKE’s custom UI worked reliably (unlike AWS EKS), and I had no issues viewing the details of my cluster. Unfortunately, the UI was very poorly designed, and I found myself struggling to do basic steps like adding nodes to the node pool. Did I figure it out? Yes. Was it worse than the OSS dashboard? Yes.

Azure AKS’ custom UI was surprisingly good. This might be a controversial opinion, and I only used the dashboard for a couple of days. But I liked it. I found that they compartmentalized the information well, and I was able to navigate and do the basic operations easily. Kudos to the Azure team.

AWS EKS’ custom UI is (unsurprisingly) the worst of the three. Getting the page to render for anyone besides the creator is obnoxious (see the next section). Even once I got this working, I ran into multiple bugs where refreshing the page would cause different results (nodes would appear and disappear). This was especially necessary because of the random errors I got from nodes failing to join the pool.

DigitalOcean, Linode, Scaleway, and IBMCloud all deploy github.com/kubernetes/dashboard. This made for a good experience because it was (a) a dashboard I was familiar with and (b) it worked out of the gate without any extra configuration. Kudos for a creative solution that just works.

OVHCloud gave up on trying to provide their users with a good experience, and they provide nothing. I gave them the worst rating because doing nothing was glaringly bad compared to the other cloud providers.

K8s Sane Defaults

Be warned: this section is mostly me complaining.

Getting everything stood up was a breeze for most of the hosting providers. They included sane defaults for many values so that you could get something working quickly. Kubernetes is infamous for its complexity, so anything that simplifies this path is a relief. However, there were two cloud providers that had such poor sane defaults, I needed to call them out.

Why AWS wins the disappointment award.

My first complaint is that AWS EKS is not compatible with default VPCs (incompatible subnet layout). This means you must deploy your own VPC, subnets, internet gateways, security groups, nat gateways, routing tables, routing rules, and IAM profiles. No other cloud provider needed more than 8 objects, but the AWS implementation required 64.

Even once I got the networking stack set up, I ran into an issue where the node pool failed to create properly. The resolution required deleting the entire cluster and trying again. Don’t forget AWS has a 15-minute bootup time.

After that, I ran into an issue where the PostgreSQL chart was unschedulable due to AWS EKS not including the EBS CSI driver by default. All of the other seven cloud providers install a compatible CSI driver by default, so I didn’t even consider that AWS never installed one.

When trying to investigate the above issue, I learned that AWS EKS does not expose Kubernetes details to its dashboard (by default). Only the user that creates the cluster can view it (not even the root user can bypass this). Is this resolvable? Yes. Is AWS the only one that makes you do this? Yes.

Can you see why I’m a bit bitter?

IBMCloud suffers from split-brain syndrome.

IBMCloud has been migrating between an old architecture and a new virtual networking architecture. In the interim, there are unfortunately many like-sounding objects which differ in functionality (classic vs VPC). It was much more difficult figuring out what settings I needed since there were two of most things.

Another example is that IBM has multiple different block storage types, but they differ in functionality in ways that are infuriating such as the default owner of the content of the drive. This breaks the Bitnami PostgreSQL chart. Their Github issues are littered with people stumbling against it. Out of all 8 cloud providers, only IBMCloud had an issue of this nature.

Cost Analysis Across All 8 Providers

Cloud cost is, unfortunately, a complex beast that is only getting more complex with time. The cloud platforms (yes, all of them) are trying to charge you for the max they can get away with. Some try to lure you in with promises of no hidden fees but then trick you during the launch process. Others are intentionally vague about what type of hardware you will be running in, likely because they plan on using the worst hardware they have.

I kept it simple for the cost comparison: a generic 2 vCPU, 8 GB RAM node with 100GB of attached block storage. I ignored networking and account costs (boo OVHCloud, boo your flat $30 / month charge). Below is the final result:

Caveat with control plane management fee: Several cloud providers advertise no fees for the management plane, but the fine details mention that they run it in a non-HA capacity or offer no guarantees around uptime. Nice try, Azure! I double-checked each cloud provider and made sure that I was selecting an HA offering.

Caveat about vCPU speed: Unfortunately, not all cloud providers are very clear about what your vCPU is going to be. Even Google’s GKE Autopilot simply says they will use general-purpose compute (which could be one of three families). Obviously, the hosting provider is going to optimize for their own costs behind the scenes, but that can make comparing across cloud providers difficult. I did the best I could.

Caveat about 12-month commitment billing: You’ll notice that the major cloud providers offer significant discounts if you sign up for 1 year pre-paid. So the game is: can you predict a years worth of usage?

In contrast, OVHCloud only requires a 1-month commitment. Their price is a ridiculously cheap compared to everyone else. Maybe the money they save on data center fire safety is passed back to the end user.

Wrap-Up

This blog post was a ton of fun to research and write. Getting hands-on experience with 8 different managed Kubernetes providers was quite the adventure. All of the work is documented in this repo. Here are the high-level takeaways:

Best overall: Azure
Best for startups: Linode
Best for cost: OVHCloud

For the three major clouds:

While Azure is technically the best Kubernetes provider, I still find the rest of their cloud platform infuriating. If your team is already familiar with Azure then it’s an obvious choice. However, I’m not sure it’s worth it just for the Kubernetes experience. I would not use AWS EKS, even though I’m deeply familiar with the platform. Google is the most expensive, but it might be the best balance between Azure and AWS.

For startups:

If you are a small startup without a lot of infrastructure experience, I do not recommend AWS, Azure, or GCloud. Their user experience is terrible enough to be a deterrent. Be careful about the smaller clouds (like DigitalOcean), which try to convince you that they are the cheapest solution. Be cautious when anyone says they charge “no hidden fees.” However, they are simple, and they just work.

A good growth path for a startup would be to start on a platform like Vercel, and then migrate to Linode when you have more need for optimizing your compute and storage layer. Move to a major cloud provider once you can fully dedicate a team and have a security professional to help you secure it.

--

--

Elliot Graebert

Director of Engineering at Skydio, Ex-Palantir, Infrastructure and Security Nerd, Gamer, Dad