Ask the Tech Lead: I have to make a technical decision but I can’t know the right answer
I’m hoping to make this a series of posts, discussing the unwritten advice for excelling in highly technical leadership. In the spirit of Camille Fournier’s excellent series, I’ll tackle some of the hardest questions I’ve heard from coworkers and mentees in my time as a lead engineer for teams and groups at Yelp. Huge credit to Jonathan Maltz for help refining this particular post and the overall format.
The question: How do I make an impossible technical decision wisely?
I’m struggling with making a choice about the technical direction of an upcoming project. This choice will involve significant expense: multiple engineer-months worth of work either way. I need to make a call, but I don’t feel like I have enough information to guarantee what the right direction is.
What’s the right approach when I need to make a big bet but have little guarantee of the right call in the end?
The Solution: Think like a scientist
Major decisions aren’t easy. There are many reasons a technical choice might feel impossible:
- The decision is often between the status quo and some option we have no or little experience in.
- Sometimes the best and worst case outcomes are either unknown or have a scary amount of variance.
- Sufficiently new or different ideas often imply unknown challenges lurking. What if those unknown challenges are really bad and make it unappealing in hindsight?
- New ideas might imply major architecture changes that are outright dangerous. If we fail to control risk, it may not matter how good a choice we make.
The reality is you still have to make impossible decisions, even if the right choice can’t be known until later.
My advice is to work your decision like you would a scientific experiment: deeply invest in learning about the problem, build a hypothesis about the right call, and then, most importantly, propose a series of small, incremental experiments to build confidence that your choice is correct.
As a whole, my approach to these big questions looks like:
- Frame the big question and take an opinionated stance on the answer based on whatever data is currently available.
- Come up with an initial experiment to partially vet that stance. It should be quick (~1 quarter) to accomplish, give meaningful directional feedback on whether the opinionated stance is still correct, and hopefully provide engineering leverage to test the next experiment more easily/quickly/safely.
- Evaluate the experiment’s result. Does it suggest our opinionated answer to the original big question was right? Wrong? Have you learned something that needs to alter your answer to the big question?
- If needed, update our best current proposal from this feedback.
- Rinse and repeat steps 2–4 until you’ve answered the big question empirically.
Applying this to a concrete problem
In my own work, I’ve been digging into whether we should invest in using AWS Lambda more within Step Functions. Unfortunately we don’t yet have much experience using Lambdas and this would imply a pretty big technical effort to make the switch. Is it worth it? This is a big, uncertain question, so let’s see the framework in action.
Frame the big question, take a stance
We’ve been using an API-drive pull architecture up until now, but nearly all companies in the industry use Lambdas. Let’s pick the largest change for our framed question: “should we use Lambdas for all of our Step Functions tasks”?
After a little research and my anecdotal survey of industry, I’m sufficiently curious about the alleged development velocity and ease-of-use of Lambdas to take an opinionated stance: “we should use Lambdas by default with Step Functions, with the pull-based architecture only used as a fallback in rare cases”.
Come up with an initial experiment
Our initial experiment should have a few traits: it should be quite small, give us meaningful feedback, and be technically feasible. In particular, the size and scope of our experiments should start very small and grow larger as we gain confidence in our overall hypothesis.
So for our first experiment, I aimed small and replaced a trivial task with a Lambda: all it did was reformat some JSON and log the result. This is incredibly “boring” from a technical perspective, but still required making some important directional choices. Namely:
- How do we deploy and monitor lambdas?
- Should we use any frameworks?
This line of thought led us to discover a few important hypotheses I wanted to prove/disprove with our first experiment:
- Is the serverless framework a good tool to leverage to build Lambdas?
- Can we build a CI/CD pipeline that feels like best-in-class service tooling that actually deploys Lambdas instead of our normal SOA setup?
- Are Lambdas performant in production?
- In the end, will this change let us get a functional change through the coding lifecycle faster, without sacrificing safety or architectural sanity?
Evaluate the experiment’s result
For my particular experiment, production experimentation suggests the answer to all of these questions is “yes, Lambdas seem to do as well or better than the status quo”.
Our hypotheses happened to mostly be answering yes/no questions, but this isn’t the only way to measure success. Depending on your particular experiment you may want to check business metrics or even squishier human ideas like “oncall happiness”. The important thing is that you have a clear idea of what you want to learn from your experiment and an clear sense of how you plan to measure that learning once the experiment is live.
Update our overall proposal and iterate
After our first Lambda experiment, the result reinforced the direction of the overall plan, and no major updates were required. Excitingly, we did learn enough to reprioritize the next steps: several problems we thought would require entire future experiments to vet were actually completely solved by Serverless framework plugins. This feedback made us more confident betting on the Serverless Framework as a technology and let us tweak our roadmap to actually be more aggressive on what we tried next.
This feedback loop (experiment, see results, modify hypothesis, experiment again) is crucial to noticing problems and course correcting. Make sure that even your big nebulous bets have a clear way to learn + iterate and you’re comfortable with the cost of the worst-case outcome. This helps protect your project from losing momentum halfway due to an unsuccessful experiment.
As the problems get messier and more complex, it can help to centralize this process in a single long-running document. It’s a great scratch space for thinking through future experiments and making sure others can follow along with your thought process while you’re at it. My rough format:
- What’s the big idea? (add historical context, frame the opportunity)
- Who are the owners this project(s)? (clear ownership helps avoid death by analysis and offers clear points of contact)
- What’s my best guess at the right answer? (doesn’t have to be correct in hindsight, just has to be a useful, opinionated stance that implies action)
- What experiments have we already done in this effort? What did we learn from them?
- What is the next, most valuable experiment we should try?
The ability to take on a vague project of potentially massive scope without succumbing to a blind guesswork is quite challenging. But on the upside, it’s also a rare skill that you’ll use more and more as you become a more senior technical leader.
Luckily for all of us, the key isn’t to magically know the right answer up front. Instead you need to build deep context in the problem space — really live and breathe it — until an initial educated guess forms itself. Your first experiment should provide directional feedback and force you to take theoretical ideas and build them for real. Then each ensuing bet shapes your hypotheses about the overall question, and guides you down the rest of the decision tree for your project.