Scaling held knowledge to unblock teams and untangle software complexity
This article was originally published on LeadDev.
Policygenius is America’s leading online insurance marketplace. Our mission is to help people get insurance right by making it easy for them to understand their options, compare quotes, and buy a policy, all in one place.
Complexity is software’s bogeyman. It’s often blamed as the source of issues — something to fear or avoid. It’s a problem we hope engineers will tackle and tear down, and is embedded into career ladders. Career rubrics often expect experienced engineers to build ‘simple’ solutions to increasingly ‘complex’ problems.
But in these circumstances, complexity can feel like a footnote. We often praise engineers on the impact of successful projects, but less frequently for the simplicity of the solutions.
To make matters worse, it can be hard to feel well-equipped to coach others on problems rooted in complexity. One either has to hop into a toga to begin a philosophical conversation about the nature of complex systems, which doesn’t necessarily help anybody understand the problem any better, or speak to the problem in the narrow context of current project work.
In this article, I want to start to uncover the fuzzy story of complexity by looking at it through one possible concrete lens. A lens that’s easy for us to meaningfully define, act on, and collaborate on in organizations.
The complexity this article refers to is the amount of specialized knowledge you need to acquire to understand and make changes to a system; whether that system is a piece of software (like a microservice or library), or something else entirely (like a business process). If it takes you significantly longer to understand a system than it does to make a change to it, complexity is your main blocker.
Unlike the concrete complexity metrics we sometimes unleash static analysis tools at, for example, cyclomatic or branching complexity, this isn’t the sort of complexity that computers are well-equipped to spit out a score for. A compiler will happily compile programs that are entirely inscrutable to humans.
While we can’t easily measure this type of complexity, complex systems create measurable problems. Missing estimates at a high frequency, high MTTD (mean time to discover), high MTTR (mean time to restore) during production incidents, or slow onboarding times for new engineers, can all be symptoms of complex systems.
But if we can’t measure it, how can we meaningfully impact the amount of complexity we have to deal with?
The community knowledge pool
If I ask ten Ruby on Rails developers where I can find the code that handles /api/users, they’ll all point me straight to the app/controllers/api directory of the project, probably in a file named users_controller.rb.
Ruby on Rails takes a strong stance both on broad concepts like the file structure for API endpoint definitions, or how to launch asynchronous jobs, and more narrow concepts like how to send an email. In this way, the Rails community is deeply rooted in convention.
By establishing strong conventions throughout the stack, the framework created a superpower. It reduced the amount of specialized knowledge developers needed to gain to work at a new company that also uses Rails. By doing this it demolished the barrier to entry for a significant portion of the entire software industry.
Rails is a shared mental model that allows hundreds of thousands of developers, working in millions of applications, to dive into new codebases industry-wide and be productive immediately.
Bringing it local
Convention is the mass-produced, workhorse knowledge that counters the bespoke mental models for systems in our organization.
Convention is tribal knowledge at scale.
In order to tear apart knowledge silos and create a company-wide knowledge pool, establishing conventions should be a direct goal of engineering organizations. By doing so, we create an environment that allows engineers to collaborate more widely and effectively, and reduce the amount of time we spend discussing the same problems and creating the same solutions.
Convention that doesn’t scale
In order to allow developers to make effective use of the mind share of the surrounding community, effective internal conventions should complement, rather than clash with, those of the larger communities we’re a part of. Versus the Rails community, the conventions we establish in organizations can be more prescriptive, perfectly suited to the business, the people, and the ecosystem of tools we work with.
Additionally, good convention is established on purpose. Many engineers have worked in codebases where a sort of amorphous convention has been created largely by the copy-paste command. This allows for too many exceptions to the rule, and leads to convention that nobody understands and nobody really wanted to begin with.
Writing documentation on conventions is a better starting point, but discovering documentation is a hard problem which is frequently worsened by imperfect search capabilities. Documentation plays an excellent support role for understanding why conventions are what they are, but rarely allows us to widely enforce convention.
When the Go community introduced gofmt, they left the decades-long debate of tabs vs. spaces dead in its tracks. By making a choice and enforcing it with the tool, they both established a strong convention, and made it so developers followed that convention incidentally. Developers in Go no longer had to adjust their wide array of editors for formatting options. Versus documentation alone, tooling turns convention into the path of least resistance for developers, and removes the need to intentionally search out existing patterns.
Just as importantly, with tooling you can change existing conventions as new problems are identified, without developers having to change how they work with them. As long as usage of the tools remains the same, they can be blissfully unaware of the difference.
Tooling is the most effective driver of convention. In one organization I worked in, for example, `make test` could be used to spin up test dependencies and run tests in each of the hundreds of different code bases. By maintaining this single command in each codebase, developers were empowered to make broad changes to dependencies or technologies for testing a codebase in the context they cared about — without affecting the workflow of others.
Beyond developers, it allowed the tools themselves to get continuously better. It’s easier to develop tools for systems that follow the same conventions. Tooling-driven convention is a delightful feedback loop with exponential productivity benefits.
As your organization grows, and the amount of code or codebases your organization has multiplies, building tools that allow engineers to easily share convention acts as the great normalizer. It allows developers to work in different regions of the business without overhead, and allows teams to effectively collaborate or support the efforts of others.
Skip the mandate
While the easiest choice is the one developers don’t have to make, the goal of convention shouldn’t be to remove decision-making from developers in your organization. We still need to leave opportunity for people to establish new options, or go against the grain when the business case calls for it. Ingenuity is part of what enables organizations to thrive.
But deciding when to lean on existing convention versus exploring new territory is a skill we can help engineers develop. We can foster and support good ideas, or play the devil’s advocate when we think caution is necessary.
Rubrics built with clarity
Engineers taking part in enriching the convention of an organization is more concrete than any expectations around complexity. Instead of encouraging engineers to reduce complex solutions to simple problems, which leaves ample room for bias, we can ask engineers at various parts of their careers to:
- Effectively use established conventions to tackle new problems;
- Build and effectively communicate arguments for building things in an unconventional way, when needed;
- Establish new conventions where solutions are missing;
- Coach other engineers to identify and use appropriate conventions;
- Proactively identify gaps in existing convention and help teams get ahead of problems they may cause;
- Establish conventions in industry that help the industry tackle wide-scale problems.
By doing so, we can hold engineers to a standard that’s easier to discuss, easier to coach through, and easy to point to. Versus standards grounded in complexity, we can create standards that are unambiguous and easy to aspire to.
One of the biggest drivers of complexity is how hard it is to measure or have effective conversations about.
Convention is one of these components. It’s easy for us to reason about, share knowledge about, and make better.
It’s not the only piece of the equation. It’s one factor amongst many that turn us around from building complex systems into systems that are easy to sustain. By breaking complexity into concepts that are easier to discuss and easier to tackle, we can make real, incremental progress in reducing it, and create technology that’s sustainable for a long time to come.
Policygenius is hiring in NYC, Durham, and remote. Check out our careers page to see open roles!