Technical Decision Making

There’s absolutely no poverty of technical advice to be found these days, be it on social media or on blog posts or at technical conferences or in publications. With the abundance of tooling of both the SaaS and open source persuasions, most vendors and open source communities are incentivized more than ever to influence developers and drum up support for their products. A challenge I’m generally seeing people on the other side of the equation — people who are consumers of the latest and greatest technologies — face is manoeuvring their way through the cacophony of hype, thought leadership and equivocation and pick the right tool to solve the problems they actually have at hand.

When it comes to separating the wheat from the chaff, navigating past the fuzzy borders and blurred lines of features, functionality and tradeoffs every tool and vendor offers, especially when time is at a premium, is what many teams find particularly challenging.

Technical decision making or thinking, for that matter, cannot be outsourced.

I don’t claim to have easy answers to these problems, but there are a few patterns I’ve seen repeat often enough in the last couple of years I feel are worth writing down. A lot of this is easier said than done and some of these missteps are often only obvious in retrospect.

Solving the wrong problem

Technical decision making isn’t so much about which tool to pick than what problem to solve. Once the problem space has been well defined, then evaluating the possible solutions becomes a lot easier.

Having a solid grasp of the problem one actually needs solving can be a lot harder than one might acknowledge, especially so when a large number of well-evangelized “solutions” are up for grabs.

It’s easier than ever in the current era to put the cart before the horse. At places where I’ve worked and from speaking with friends, the tendency to inadvertently conflate the problem space with the solution space offered by popular tools can be very compelling.

The perils of starting with a solution first only to later find a convincing argument to somehow retrofit our problem space into the solution space are myriad. Wasted engineering resources and cycles, added complexity to a tech stack when it’s not strictly warranted, the accrual of technical debt as well as the operational and maintenance costs. And yet, this is something that happens more frequently than many would like to acknowledge. It’s certainly happened in my own experience.

Prioritization of tip of the iceberg problems

What’s particularly challenging about deciding what problem to solve is prioritization. At any given time, there is probably a laundry list of things that are “broken” or suboptimal. Prioritization becomes easy when we have a full understanding of all the problems we’re trying to rank in order of priority.

As often as not, things that are “visibly” broken or sub-optimal might either be a red-herring or a symptom of deeper underlying problems. One of the ill-fated projects I worked on a few years ago strived to “fix” (rewrite) an API that was deemed “inconsistent”, whereas the real problem (which only became obvious once the rewrite was well underway) was that some of the core underlying functionality was irredeemably broken. Fixing the “inconsistencies” of the request/response structure in this case clearly was the wrong problem to solve, since it did little to improve the reliability or uptime of the API. The API was still as “broken” as it previously was, even if the request/response structure was more consistent.

The easy thing to do is solve the most “visible” problem, when the real problem is lurking beneath the surface. I call this the “tip of the iceberg” approach.

Tip of the iceberg problems

It becomes important to prioritize the “real problems” for solving, not just what might appear broken on the surface. Identifying the “real problems” requires having a deeper understanding of the system (as well as the systems it interacts with) and more importantly, better visibility into the annals of the system.

Yet, increasing visibility into our systems can only help alleviate this problem to an extent, since it’s impossible to be able to divine every possible problem that our system might encounter during the course of its evolution. Data-driven decision making works best when buttressed with good intuition and instincts, particularly when making decisions while missing bits of important data.

Pattern Matching

The reason companies hire candidates with X years of experience with technology Y is because the company then hopes the candidate can use their experience with technology Y to better solve the problems faced by the company. This usually works out well enough, except at times when it doesn’t. We’ve all worked with (or have ourselves been) that person who (often annoyingly) suggests “At [previous company] we …”

Oftentimes the easy thing to do is pattern match and convince oneself that a solution that worked for another organization facing similar problems is the best solution for the problem at hand. I‘ve seen that people who are new to a company or organization often fall into this habit, since it then allows them to assert their knowledge of the problem domain or their previous experience or expertise in the area. People making these suggestions often have their hearts in the right place, and yet it’s unlikely that two problems (especially in two different companies or even in two different organizations within the same company) are entirely identical.

Even if there is a vast overlap between the problem at hand and the problem previously solved by a team member, the devil often is in the nuanced details. And not understanding these nuances ends up being the reason why we fail to appreciate that the problem at hand is, in fact, different from the problem previously solved, concomitantly rendering the previous solution the wrong solution to the current problem at hand.

The Swiss-Army-Knife Conundrum

What I struggle with often is what I call the “Swiss-Army-Knife Conundrum” — or in other words, failing to pick a solution with a narrow scope.

Good solutions, for the most part, are optimized for solving only just the problem that really needs solving. Any additional “benefit” that might “come for free” can prove to be an incentive toward inventing yet another problem that the “benefit” can then solve.

This can lead to I describe as “problem creep”.

For instance, if I want to solve problems A, B and C, I could either pick a tool that solves problem A, B, C, D, E, F, G and H. This goes against my preference for a solution with a narrow scope. However if the alternative is to pick three separate tools each which solves one of problems A, B and C individually but leads to an increase in the overall complexity of the system, then picking the tool that does too much is probably the better solution here.

Picking the right tool for the job often involves walking the tightrope between picking the tool that actually solves the problem and minimizing the increase in complexity of the system as a whole.

Collateral increase in overall complexity

Any technology added to one’s stack by default increases complexity of the stack. A lot of tools that are being built either as open source software or as commercial software draw inspiration from extremely large scale systems built at companies like Google.

While there might be several great ideas incorporated into these tools that have universal utility, for the most part these tools come bundled with a fair bit of unwarranted complexity.

And sadly, the current state of these tooling is such that they offer an all-or-nothing proposition — we as consumers aren’t in the position to pick or choose only what we truly require. The “sane default” offered by many of these tools is now an order of magnitude more elaborate than what’s truly warranted for most use-cases.

While solving problems, one of the most difficult challenge lies in choosing solutions that reduce the overall complexity. Simplicity, especially in complex systems, often involves making tradeoffs and putting pragmatism above everything else. This becomes even more difficult with the normalization of “big-company complexity” that’s underway currently, at least in the infrastructure space.

The aftermath of this normalization would mean that the systems we build leveraging these tools will themselves be inherently more complex from the onset than they really need to be, which in turn warrants investment in more tooling to make sense of the added complexity, perpetuating the vicious cycle.

Passion projects

An unrelated conversation I had with a friend earlier this week was about finding the right balance between useful and enjoyable work. Not all useful work is going to be enjoyable to an individual contributor, and not all enjoyable work is necessarily useful to the company. It thus becomes important for an IC and their manager to find the right balance between useful work and work that’s personally enjoyable to the IC, and fail fast in cases where the work being done proves to be not useful to the organization.

The most fascinating part of this conversation was how individual contributors end up in a situation where they are doing work enjoyable to them but not useful to the company.

I’ve had first hand experience with this fallacy, which again boils down to people conflating the problem space with the solution space of their favorite technology, or pitching their passion project as the solution to the “tip of the iceberg” problem.

In this case, carefully weighing the opportunity cost can sometimes prove to be an eye-opener. Also helpful in deterring passion projects from taking flight is requiring the engineer making the proposal to also come up with a detailed plan about the maintenance costs, maturity of tooling, ease of debugging, how well the new technology fits in with the existing stack and actual benchmarks (in cases when a new technology is proposed for “performance reasons”). Introducing the necessary speed bumps early on in the decision making process can save a lot of time and effort in the long run. This is something we’ve internalized fairly well where I work, where in order to justify a new project, the engineer proposing the project is required to furnish enough evidence about how the project will be impactful.

In addition to the aforementioned considerations, knowing how feasible it’d be to fail fast and fail gracefully is also pretty valuable.

Not failing fast enough or failing well

Despite best efforts, sometimes mistakes are inevitable. In such cases it becomes important to be able to fail fast and move on.

I’ve been a part of projects that were clearly a mistake to have even been undertaken to begin with. What was worse was that we weren’t able to fail fast enough out of one of these projects, leaving the team with the only option to soldier on and see the project through to completion, even when that meant living with that decision for the foreseeable future.

It’s important for projects — especially large projects — to be composed in the form of partial subproblems that can individually succeed, so that failing fast from any ill-conceived subproblem becomes both feasible and doesn’t end up derailing the project as a whole. Or in other words, to plan projects in such a way that no subproblem can become the single point of failure. Which again is easier said than done.

Conclusion

Technical decision making is easily one of the most important aspects of engineering. This becomes particularly challenging when presented with an embarrassment of riches in the form of “solutions”. While this post didn’t present any answers on how best to make the right technical decisions, I hope it furnished a list of things to think about or be wary of.