Why does Cloud UX lag behind other software products?

Brian Grant
5 min readJun 10, 2024

--

There’s actually both a what and a why to this story. I’ll start with the what. What do I mean?

Think about software products that enable users to produce some kind of content: blog posts, documents, presentations, spreadsheets, image and video editors, website creators, and so on. They have WYSIWYG editing, autosave, versioning, undo, export, import, copying, sharing, commenting, collaboration notifications, and so on. In tools where I might create similar pieces of content, like presentations, there are mechanisms to create templates from existing content.

There are lots of UX ideas from other domains that would be applicable to Cloud. For instance, I mentioned the retail shopping-cart analogy in my template catalog post and ChatOps in my post about email-based approval requests. Stock trading sites allow transactions to be modeled before they are placed. When I mark a bill as being paid online, I can add a note to explain how. Voice assistants shortcut the steps of taking out and unlocking a device, opening the relevant application, navigating to the right place, and manually performing the desired action. Email and document products support mail merge to generate variants from a spreadsheet. I could go on.

Now, Cloud products are not entirely freeform content. They have specific services, operations, resources, attributes, and features. They are extremely complicated. There are other domains with similar complexity. The example I like to use is tax software. I don’t know about other countries, but US tax forms can be very complicated and require many many details. Thankfully, there are UX techniques for navigating complex domains and reducing cognitive load, such as progressive disclosure. There are also approaches to accelerate data entry, like importing data and propagating values used in multiple entries. Tax software simplifies and accelerates the process of correctly entering the necessary information.

Ok, so why hasn’t Cloud UX evolved as much as these other kinds of software over the years? I have some observations and some hypotheses.

Priorities and the hierarchy of needs for the cloud platform as a whole could be one factor. The range of services needed for a wide variety of workloads is fairly broad: compute, storage, networking, databases, caches, messaging, monitoring, logging, secret management, artifact management, etc. Enterprise requirements are fairly steep also: identity, privacy, security, reliability, availability, scalability, data residency, compliance, etc. If any critical customer requirement is missing, it can block a sale. Additionally, probably the data planes of individual services are perceived to be more critical than ease of management. For example, improving VM or database size or performance, or supporting the latest GPUs. And if large customers buy based on capabilities, features, and cost, then that’s what providers will focus on.

Even once table stakes are covered, the business cases of building new products can be clearer than incremental UX improvements. And there appears to be no end of products that could be built. Now even more, with AI. In general, initiatives with clearer revenue potential are easier to evaluate and prioritize.

Infrastructure as a Service (IaaS) products are also fairly unopinionated. They typically are not oriented towards what users are trying to do, like deploying applications or troubleshooting problems. Instead, they help users create networks, virtual machines, disks, and so on. This generality enables the products to be widely applicable. But a consequence is that Cloud is more like Home Depot than like Ikea — it’s DIY.

More opinionated platforms are more narrowly applicable to specific use cases and requirements. Finding use cases large enough to be worth targeting for a company with broad-based IaaS products can be elusive. Already the number of IaaS customers is likely to be smaller than for consumer products that have more investment in UX.

Conway’s Law may play a role — shipping the org chart. A team that owns a single infrastructure component, like virtual machines or a database, naturally may just focus on making that piece simpler rather than addressing the broader customer Jobs To Be Done (JTBD), like provisioning the whole infrastructure stack for an application. That would probably have to be a dedicated product.

At a technical level, a Cloud service is a suite of products that has been developed over time, often including acquisitions. Streamlining a JTBD can require integrating with several distinct products, which may have some aspects in common, such as authentication mechanisms, but also a number of differences. This can make integrations fairly expensive (unlike for Kubernetes).

Infrastructure as Code (IaC) tools like Terraform are designed to integrate with a very wide range of services and resources. As a result, they are also unopinionated. Their concepts are also fairly low level, such as resource types and attributes. And they tend to be text-based, so that they can be somewhat scriptable and programmable.

Solutions implemented using IaC are used to address specific scenarios. They have similar restrictions as opinionated platforms, but are much cheaper to implement. I think of such tools and frameworks as a “kit” approach — they make DIY easier.

Unfortunately, the usage patterns are not as standard as tax forms. The interfaces to IaC modules are input variables defined by the modules. Typically the UX provided for input variables is just a data-entry form: no progressive disclosure, no step-by-step guidance, etc.

Also, the tools are bounded in scope, so several tools, each with their own mechanisms, configuration formats, and ecosystems, typically need to be used in combination to address a single user JTBD.

Platform engineering and developer portals products commonly leverage IaC as well. In some cases these portals complement cloud provider UIs and in some cases replace them, such as in the case of exclusive actuation through template catalogs, especially if a policy enforcement point needs to be interposed between the users and the providers. They may also integrate best-of-breed products from several vendors, such as observability dashboards. The need for multi-cloud, multi-vendor support pushes the problem into the fragmented market of cloud management tools and platforms.

As one would expect, there’s interest in applying AI to this problem. LLMs seem good at interpolation and extrapolation (and hallucination), and require large numbers of examples for training, which are not as plentiful for IaC as, for instance, code written using popular programming languages. We’re just at the early stages of addressing the many challenges for this domain.

Can anything be done to make Cloud UX economically compelling and feasible to improve? That’s the puzzle the industry needs to solve.

If you found this interesting, you may be interested in other posts in my Infrastructure as Code and Declarative Configuration series.

--

--

Brian Grant

Original lead architect of Kubernetes and its declarative model. Former Uber Tech Lead of Google Cloud's API standards, SDK, CLI, and Infrastructure as Code.