When you say you operate “at scale”, what do you mean exactly?

A recent Hacker News post led me to an article on Kubernetes, and using it to host gaming servers “at scale.” At what scale, exactly?

A trend I’ve noticed popping up is people who believe containers, and orchestration engines like Kubernetes, grant an automatic “infinite scale” pass. Sadly this isn’t the case. A lot more than simply adding more servers or container goes into large scale operations.

For one, adding more servers adds more monitoring that has to be done. The servers, virtual or otherwise, have to be monitored for performance and reliability metrics. Are they handling the load OK? Ate they the correct (virtual) size? Do they need to be vertically scaled? The more servers and containers you add, the more metric you have to juggle, monitor, and act upon.

What about backups? If you’re using AWS like a lot of us are, how often are you backing up EBS volumes? What about disks that hold persistent container volumes? You have to always back them up, especially if a Pod is moved into that host, no? Hopefully I’m not missing a trick here.

How far does GlusterFS or NFS (via AWS’ EFS) stretch across your Kubernetes hosts before performance drops? Shared network storage is very useful for distributing static configuration across hosts and only having to update one place, but it only goes so far.

Now for automation. Are you using Ansible or something similar to automate deployments? Every time you add a host to your cluster, is it being locked down and secured in an automatic manner? If not, then how long can you hold this “at scale” banner for when doing everything manually?

One final note: how are you handling data replication across continents to ensure your customers in Asia are having the same experience as those in Italy?

Operating “at scale” and thinking Kubernetes is all you need to achieve that scale is a quick way of drowning your self. A lot more goes into the operations of that cluster in a production, busy environment. A hell of a lot more.

What are your thoughts?