Chaos vs Conformity

Andy Walker
Nov 6, 2020 · 6 min read

Navigating the tricky path between autonomy and standardisation

Photo by Hans-Peter Gauster on Unsplash

Recently I was talking to the CEO and CTO of a company and they asked me whether they should standardise on a single tech stack or have diverse stacks. It was pretty clear their preference was to standardise and minimise duplication of effort. On the surface this seems like a good idea. I’ve also had very heated conversations with engineering teams who are prepared to relinquish their favourite language or stack when you pry it from their cold dead hands.

This is a common disconnect for leadership and engineering teams. It comes up again and again disguised as an argument about autonomy — particularly as a company scales beyond a certain point. The reality is that neither extreme is healthy. It’s easy enough to demonstrate this.

Single stacks kill progress by stopping innovation

  • If you have a single stack then you are locked into it forever. This commits you to a monolith. When you change the stack you have to commit to changing everything. This limits your future flexibility.
  • A single stack prevents you from choosing the right tool for the job. Python is great for data scientists but you may want to productionise your model to run at scale. (Hopefully) No one is going to write a high volume backend in Javascript but you’d be crazy to write your browser based user interface in C++.
  • Engineers don’t like having their design choices taken away. Given the choice between a highly motivated team that owns the problem and one which hates the environment they’re building in I would always tend towards the former.

Multistacks kill progress by generating tech debt and taking you down the path of Conway’s Law (shipping your organisation chart)

  • It’s hard to run multiple stacks in production. In order to test, monitor and deploy you push the complexity onto your DevOps and SRE teams. Who have to build custom stuff to paper over the cracks.
  • Multiple stacks struggle to talk to each other. In a diverse system interoperability of data is king.
  • It’s hard for people to work in other people’s codebases. So no internal open source. Which in turn means duplication of effort.
  • The maintenance cost scales with the number of tech stacks you support and your organisation becomes fragile for skills which aren’t common within the company.

Both of these extremes are bad. After all, progress is the name of the game. Instead of thinking about the stacks themselves it’s useful to think about the joins between them. There are some things which need to be common within your company. To help understand this I’m going to talk about how Google does it. It’s easy to think of Google as a fairly homogeneous set up. The reality is engineers have a lot of freedom. Within certain constraints, that is. If you want to go your own way then you are responsible for making sure your system joins up. So if you are willing to invest that effort you can go whichever way you want. This doesn’t always lead to great design choices. I remember watching some incredibly smart people trying to get the LAMP stack working on Google’s equivalent of Kubernetes. It sort of worked but it was impossible to iron out the flakiness.

The key part of this balancing act, then, is understanding what constraints exist on freedom. This empowers engineers to make choices they believe in whilst also ensuring they can be held to account for playing nicely with the rest of the company. And this all comes down to interfaces and data structures. You want to minimise the number of each of these you have (ideally as close to 1 as is possible) whilst accepting you can never account for all future problems. Here’s a (probably incomplete list) of the things you want consensus on.

  • Interoperability of data. Unless you’re building a monolith then stuff needs to talk to each other. You want common formats for passing data around. You want to be able to update one service independently of another. You do not want to build a multitude of factories for serialising and deserialising objects. Trust me — you REALLY don’t.
  • Discovery of services. Discovery is often overlooked. This leads to endpoints being hardcoded all over the place. This leads to fragile production infrastructure and code. The endpoint needs to be abstracted so it can move around without affecting its clients.
  • Making requests between services. What language do services talk to each other. What tools are you providing to observe the end to end flow of requests. At some point stuff will break. Debugging needs not to be a black art known partially in different places.
  • Routing of requests between services. What happens when services move? I can tell you from experience that trying to code this yourself is a rabbit hole that sucks up a lot of time you could be solving user and business problems in.
  • Exposing operational metrics. You need to run the thing in production. You want a view of how everything is running so you know when things are getting unhealthy. From a business perspective availability equates to trust.
  • Logging of requests. You want to know who is doing what in your system. This allows you to build better systems. You want to experiment and explore your problem space. A multitude of mechanisms for logging prevents you doing this. You are not reduced to operating on gut instinct rather than being data driven.
  • Management of change of interfaces and data. When services change they can break other services. Your build system needs to have some way of identifying these breakages before they hit production so you can fix them cheaply.
  • Common services around account management and authentication. In today’s age of enhanced privacy laws you cannot afford to get this wrong. Multiple account management systems is one of the worst smells you can have from an infrastructure perspective. Doing this well is hard. Doing it more than once is suicidal.
  • Common deployment model. Deploying safely to production is hard. You can’t afford to build n versions of this. It also points you towards some commonality in where and how you deploy (eg. cloud, packaging and containers).

If you can get to consensus on how to do these things you can tell any engineering team that they have choices as long as they can join up with the rest of your infrastructure. It becomes simple then to determine whether a design decision is heading into crazy town. For example if you’re building a system which decides to do its own logging and operational metrics the team is then accountable for the corresponding infrastructure for collecting and analysing that log information, monitoring its own services and alerting on outages. This is a non-trivial set of things to build. Likewise any team that decides to go its own way with account management and users is storing up future privacy hell for everyone. A team which wants to go its own way without owning the resultant requirements is probably lacking in experience.

It also tells you which infrastructure teams you need to be building out. The goal here is to build something where the easiest thing to do is the right thing. Don’t force people to adopt common infra (but do reward them). The people in these infrastructure teams are the people who need to have a lot of soft skills. Since they are going to be doing a lot of influencing without authority. If they can’t work with people and solve their problems with what they’re doing then you get a stand off where there’s either open rebellion or the infrastructure teams tyrannise everyone else. Possibly an article for another day there.

The truth is there is no one answer to this question. Both extremes are bad and you have to find the middle ground which works for your organisation. Knowing it may change in the future. Interfaces and loose coupling gives you the flexibility to grow and adapt. They give you the tool to give engineering teams both autonomy and accountability for their design decisions. It tells you when you need to get teams talking to each other to get consensus. It tells you which common services teams you need to build out. And, more importantly, give you a way of viewing designs which validates whether you’re storing up trouble for later.

Get smarter at building your thing. Join The Startup’s +787K followers.

By The Startup

Get smarter at building your thing. Subscribe to receive The Startup's top 10 most read stories — delivered straight into your inbox, once a week. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Andy Walker

Written by

Ex-Google, ex-Netscape, ex-Skyscanner. Interested in solving complex problems without complexity and self sustaining self improving organisations.

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +787K followers.

Andy Walker

Written by

Ex-Google, ex-Netscape, ex-Skyscanner. Interested in solving complex problems without complexity and self sustaining self improving organisations.

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +787K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store