How 10X Organizations use the cloud to focus on customers — and nothing else
Put as few people as possible between the development of software and its deployment. Preferably zero.
The 10X organization is highly focused, with appropriately empowered and responsible executives, and spends almost all of its activities on work that impacts its customers. The 10X organization expends as little focus and effort as possible on the back office— preferring to outsource to services and other third parties.
And this is all well and good to state at a high level. But details matter, and many people get the details wrong. This post delves into specific and technical details around the right ways to use the cloud if you aspire to be a 10X Organization.
Why 10X Organizations use the cloud
In general, there are two reasons why the 10X Organization is using the cloud for application infrastructure — both revolving around software development.
- We are selling Software-as-a-Service (SaaS), and so we are constantly developing software and deploying to customers
- We are providing goods or the ability to order/understand goods through a web or app interfaces, and so we are developing software and deploying it to customers — although perhaps with lower velocity than the SaaS vendors.
Everything is about software development. For more on this topic, please check out my piece on why the rise of Amazon Web Services is all about software development.
I have written previously on the organizational, product process, and software development lifecycle practices that are associated with 10X Organizations. But none of these address what should happen to deploy software effectively and at a high, consistent velocity.
The core principle is really quite simple: put as few people as possible between the development of software and its deployment. Preferably zero.
We want fewer people because people are slow and people are inertia. Human review and action requires a healthy, capable, present human — and there is no way to guarantee that you will have a healthy, capable, present human at all times that you need one. Once you insert a person into a process, you almost inevitably code part of their identity to being in the process.
Just look at DBAs — most startups today don’t have dedicated DBAs, given that we have cloud-run databases and object-relational mapping within our development frameworks. There’s no need for a traditional DBA. But look at organizations that have been developing software for 15+ years, and most of them still have DBAs, and those DBAs are making development and deployment slower and less efficient — and they’re not going anywhere.
This is why organizations are using infrastructure automation (IA). The “fewer people rule” provides an excellent guide to understanding, between two different ways of using IA, which is better.
For Instagram, IA meant being able to scale to tens of millions of users with fewer than 15 employees total. If Instagram had had the typical production operations team, it would have died on a vine; the hundreds of releases that were necessary to get to product-market fit and to drive user adoption would have taken millennia in a traditional enterprise release environment.
How 10x Organizations use Infrastructure Automation
At a high-level, 10X Organizations use IA to develop and deploy software as efficiently as possible — and with only product acceptance/quality assurance personnel between developers and deployment. This means no: DBAs, sysadmins, network admins, other IT operations staff.
Before the riots begin, let me clarify that what some people call “Site Reliability Engineers,” “Application Operations,” or “Architects,” who work to design how the application should be deployed and define that as code or configuration, would be “developers” in my above description.
In practice, this usually means that a continuous integration system like CircleCI or CodeShip or Jenkins can automatically do 100% of software deployment after acceptance and QA is done — usually in response to a single action. For example, the product owner tags a branch as production and web hooks kick off the process.
It also means that there aren’t “ordinary” operational tasks done by humans (e.g., SRE) to keep the production application running. This implies there is no need for manual backups, manual purging of cached files, manual patching, manual scaling, etc. Emergency operations may well require humans — but they will be few and far between.
It’s worth noting that I’m talking about the 95+% of applications that aren’t at what we might call “web scale”. For example, when an additional 100ms latency on requests from time to time has a substantial organizational impact, or where one wouldn’t expect to see more than thousands of simultaneous users needing to write and then read that same data immediately afterward at exactly the same time.
The technical approaches to IA
The basic high-level requirement is that we should have code + configuration in a repository so that our continuous integration services or build + deploy system can:
- automatically deploy our code
- verify that it has been deployed correctly
- direct production traffic to the new code/environment
There are a variety of ways we can accomplish this — and honestly we’ve been able to do it for well more than five years at this point.
Option #1: The Boring
One of my favorite articles on picking stacks is Dan McKinley’s Choose Boring Technology:
What counts as boring? That’s a little tricky. “Boring” should not be conflated with “bad.” There is technology out there that is both boring and bad. You should not use any of that. But there are many choices of technology that are boring and good, or at least good enough. MySQL is boring. Postgres is boring. PHP is boring. Python is boring. Memcached is boring. Squid is boring. Cron is boring.
The nice thing about boringness (so constrained) is that the capabilities of these things are well understood.
So what’s the boring option for deploy+verify+direct traffic for most applications? I would start by picking some well-used way of bootstrapping virtual machines and running a test suite on them. For example, CloudFormation, Elastic Beanstalk, or Hashicorp Terraform; although BuildFax still uses RightScale for this.
Then, add some well-used quickly-deployed virtual machines (e.g., EC2) to run your stateless application servers, plus a relational database service (e.g., RDS) that handles failover for you , plus a load balancer (e.g., ELB) that adds/removes/health checks your application servers.
This boring option gets you the vast majority of benefits of even the most bleeding edge version of cloud application deployment today — more on that shortly. It’s also so well known, so boring, that it’s hard to screw up. It does not require top talent to execute; average developers with the right plan should execute properly 10 out of 10 times.
Option #2: The Bleeding Edge
There is a better way than the boring option. This bleeding-edge choice is to use Functions-as-a-Services (FaaS) — using AWS Lambda, and multi-tenant Databases-as-a-Service (DBaaS) like AWS DynamoDB. We’ll call this a “serverless application” for short.
While serverless applications are now far enough along to recognize that it gives noticeable benefits — it’s still early phases so you shouldn’t necessarily rearchitect existing applications working well enough the boring way.
So what’s better with a serverless application that you don’t get with the boring way? There are two core benefits: (1) you don’t have to bootstrap virtual machines, and (2) your emergency ops consists entirely of automated redeployments— or waiting for repair from your provider.
In other words, you don’t have to worry about anything other than your code and the architecture of your application within the IaaS/FaaS/DBaaS ecosystem you’re using. Both of these advantages serve to further reduce the amount of systems administration knowledge and staff required to build and operate your applications.
However, on the con side, building serverless applications requires new knowledge and skills — especially around using rudimentary and constantly-evolving tools — which increases the likelihood of making poor choices.
This is one of the reasons McKinley recommendations boring technology, and this is not a unique problem to serverless applications — it applies to all non-boring technology choices. It’s hard to know if you have the right plan — and even with the right plan, average developers may only execute properly 5 out of 10 times.
Deciding which way to go
McKinley gives the general advice that when you’re building anything, you should have at most 3 non-boring technologies; I think perhaps it would be more conservative to have 1 at most. The best determining factors should be (a) how many developers are on the project, and (b) do the lead developer(s) have direct experience with the non-boring technology.
As the number of developers goes up, and as the experience of the lead developers with the non-boring technologies go down, you should be avoiding the non-boring technologies. For example, if there’s only one lead developer who has some experience with building serverless applications, and four or fewer other developers, then I would feel comfortable going the serverless route. But if that experience is lower, or the number of developers goes higher, I would not.
TL; DR concluding thoughts
People bring inertia to organizations. 10X Organizations prevent people in back office functions from slowing down releases, and redirect employee involvement to those actions that provide value and differentiation to customers. Today’s IaaS/FaaS/DBaaS capabilities make this focus on solving customer problems — and nothing else — so much easier than it has ever been.