The Kubernetes Conundrum

The whale in the room

Published in

Notbinary

7 min readNov 15, 2018

Kubernetes is hot right now. AWS has it, Google Cloud has it, Azure has it, DigitalOcean is doing it, heck even IBM is throwing its (red) hat in the ring. I’m still trying to answer a nagging question: what problem does it really solve and who actually has that problem? In theory I get it. In practice YMMV.

I saw The Information Superhighway come and go. It was true, but useless. It’s now so part of the fabric of life that hyping it would at best gain you kindly condescending looks. I saw the long, slow insistence that was J2EE. Again, half true, mostly just too complicated to be worth it. A decade or more of death-march projects, billions spent (most publicly in government) a handful of surviving systems remain, encrusting organisations legacy IT estates. Taking the name of Best Practice in vain didn’t get us much value for money.

Take your pick from the last decade: big data, blockchain, AI, the hype cycle is rarely wanting for a parade of colourful characters. Expensive, colourful characters. Peddled by the serious, earnest, tailors that have made emperors’ clothes for over a century.

Occasionally these characters have substance. More usually the early corporate adopters in the sales cycle marry in haste and repent at leisure. Shiny new things open up to reveal a couple of years of painful implementation detail. Eventually they’re quietly dropped and replaced by commoditised, boring solutions because a new rising star now twinkles over the “Peak of Inflated Expectations”.

It’s an exhausting catherine-wheel of strategic great white hopes, always just over the horizon and out of reach. When will they ever learn? Technical intuition would be a life-saver here.

Substance

The real trick is spotting the black sheep in the enterprise herd: the one that’s going in the right direction. Fifteen years ago VMware revolutionised datacentres with virtualisation. I was there, it was brilliant. The legacy of that quantum leap lives on in ec2 and compute engine instances. Provisioning a server used to take 3–6 months, with a mosaic of logistics and people. Now servers appear and disappear by the thousands, second by second, with little more than the click of a mouse or the flicker of an API call. That’s hard value.

Virtualisation may be commoditised now, but it’s the foundation of our cloud world.

Five years ago, Docker opened a door to the next level, bringing containers to the mainstream. If virtualisation was able to multiply the efficiency of hardware, saving a ton of energy in the process, containers are now able multiply the effectiveness of each virtual machine. No longer do you have to operate one server per task, or die trying to contort those tasks into a single monolithic, monolingual J2EE server. The right tool for the job is no longer a top-down least-worst compromise decision to be enforced across the board. Each container can now make the right decisions to meet its task.

Enter Kubernetes

Containers were so successful, so useful, so valuable that, almost as soon as the world started using them, it became clear there was a need to automate deploying, monitoring, scaling and updating them. If containers are the ingredients, Kubernetes is the chef. To serve a tasty meal, you’re going to need some skills to put those flavours to work.

The story goes that Google had been using containers back when the rest of the IT world was still peddling around the back yard with its training wheels on, busily writing Gantt charts and convening Change Advisory Boards in the hope of progress, while Google, as part of a new generation of companies, was quietly racing ahead, putting aside traditional best practice and instead powerfully combining deep technical expertise with smart business thinking.

Legend has it that Google developed Borg, an automated system that could take a containerised task (or workload) and schedule it to run somewhere across their vast estate of computing power. It’s this idea, that an organisation could run a single homogenous farm of computing resource and use an automated system to sow workloads neatly across that wide field, that gave rise to Kubernetes — and Kubernetes is currently riding the hype cycle.

The conundrum

From a distance the world looks blue and green, and the snow capped mountains white. The 10,000ft view is beautiful. It makes perfect sense to distribute all your organisation’s workloads across a standardised pool of compute resource. At least, it does if you’re Google. The truism is you’re not.

Technology is honest. In hubris, in theory and in meetings, it’s possible to fudge things a little, bend the rules, fluff up inconvenient truths and paint rosy cheeks with a dash of buzzy makeup. Technology, however, is the five year old in the room saying “but that man smells and he has no hair”. Perfectly innocent, technically correct, highly embarrassing, painfully honest. Your technology won’t bend to your strategy.

Here’s the root problem: Kubernetes isn’t massively good at isolating one workload from another.

If one task decides to take all the compute duvet in the middle of the night, the others are left shivering in the cold and the system goes down. You have to think about partitioning networking, avoiding privilege escalations, keeping rogue workloads from roaming around looking for ways to subvert the platform. These turn out to be more difficult, less obvious and more confusingly unclear than you’d like. To get to one platform that’s fit to run them all, you have to start yak-shaving. That may be OK, but bear in mind it could well take a dedicated team many months to not quite get there. That’s a lot of time-to-market and budget spent. You’re not Google.

There’s an easy answer though. If you’re trying to separate workloads and you want to “limit the blast radius” in case one of them blows up, you could “just” build multiple Kubernetes platforms. Each product or service gets its own platform, neatly partitioned away from the others. Using auto-scaling cloud infrastructure, each individual Kubernetes cluster can scale up and down to meet demand.

There are flies in this ointment, (e.g. bin-packing efficiency) but they don’t necessarily spoil the soup if you can swallow them. The real killer is “incidental complexity”. When you consider that Kubernetes is a behemoth of a platform, running a single platform for a single behemoth of an organisation, say Google, makes sense. One massive container ship can carry a lot of cargo. But now we’re looking at a fleet of container ships and only lightly loading them with a few containers. There’s a massive operational overhead in running a fleet like that, so perhaps, in fact, the better option is to simply run your workloads on a few dedicated servers.

Full circle

Which makes this a double-circular argument. The reason to use Kubernetes is to efficiently utilise a large, fixed pool of resource. Cloud compute is not a fixed pool of resource. Autoscaling is key to efficiently using cloud. The original reason to use Kubernetes was efficiency. Check mate 1.

Building one large platform for your entire organisation’s IT estate is risky and hard to do well if you don’t have massive resources to invest. So you divide it into smaller platforms, which makes each platform untenably large and complex, relative to the workload running on it. It’s like bringing an aircraft carrier to a boating lake. A lot of bread to chow for just a lick of jam. Check mate 2.

It’s catch-22: if efficiency is catered for by autoscaling and limiting blast-radius by running multiple platforms means you actually need a simpler platform, then what is Kubernetes good for?

The right tool for the job

There is a problem that Kubernetes solves well. The questions you need to ask yourself are whether you have that problem, and whether that’s a good problem to have.

I’ve worked on a handful of projects where the product or service consisted of a handful of microservices. At this scale it’s actually better to go with a more basic approach. Kubernetes is unlikely to be a boost to productivity, at least not in the short term. It’s more likely to swallow a disproportionate amount of your team’s attention and leave you with little actual progress towards what could have been a relatively quick win.

So, what’s the problem that Kubernetes solves well? If you’re building a single product or service with tens or hundreds of microservices, then you probably do have a Kubernetes-sized problem. The investment may be worth it. On the other hand you may also have a design problem: a single system with that many moving parts could be doing too much or may simply be over-engineered.

Imagining an alternative

Don’t get me wrong, I like Kubernetes more than I’d like to. I’ve spent time with it and got to know it. What it does, in theory at least, is smart and needed. Who it works for and at what scale however, I’m still trying to find a good answer for. At least, one I can believe in.

Realistically, it’s not a bad least-worst solution, but be under no illusions: it’s slow to learn, complicated to work with, and the scale of it means your team will never fully understand it or get their arms around it. There are so many moving parts that, even statistically speaking, there will constantly be bugs and security vulnerabilities. It’s the J2EE of the container world and I don’t see that it’s got meaningful competition yet.

I’ll offer a closing idea. Introducing “name-driven development”, where there’s nothing more than a name for an imagined product, but that name paints a picture. It’s called Coracle. The container world, from Docker to Kubernetes, is marine-themed so, sticking with that, a coracle is a minimum viable boat. Its simple construction is lightweight, portable and will carry one person. It’s intentionally not designed for scale, it’s designed to carry one load, conveniently and with a minimum of fuss. I think it speaks of a genuinely useful alternative.

Kubernetes will I’m sure continue as the enterprise darling. The hard question is whether it answers the right question. I’d like to know if it does.