How to Solve Software Problems

Jeremy Song
Stochastic Stories
Published in
4 min readFeb 1, 2021

As software engineers, a big part of our job is to solve software problems, big or small. Big software problems are ambiguous and require lots of thinking, discussion, and debate. Typical big software problems are to improve architecture efficiency, to propose technical solutions to ambiguous business problems, etc. Small software problems have smaller scopes and less ambiguous than big software problems. Typical small software problems are fixing bugs, resolving technical difficulties.

No matter how big or small the software problems are, they both can be solved in similar ways. Over the years, I have developed my own (not necessarily unique) methodologies to tackle those software problems.

Understand the problem

Before solving a problem, you need to have a deep understanding of the problem. When encountering a problem, ask the following questions, in this specific order:

  1. Why is this even a problem? There are always some problems in the world for you to solve. If it’s problem that hurts no one, it’s not worth solving.
  2. What will happen if we don’t solve this problem? This again questions the necessity of solving this problem, even after you confirm this is a real problem. Use your SLA, estimated customer impact, your product, and business directions to help you find the answer.
  3. What’s the root cause of this problem? Lots of problems we encounter, on the surface, are symptoms, not root causes. You need to dive deep and understand the root cause. If you receive lots of tickets because the error rate breaches the threshold, find out the root cause, and don’t just adjust the threshold, yet.

Understand how the system works

You cannot possibly identify the correct root cause if you don’t understand how the system works (unless you’re extremely lucky). Before even starting to root causing a problem, try to have a basic understanding of how the system works. If your requests are randomly dropped/rejected by the service, try to understand how VIP routes your requests to the service hosts. If you don’t see the metrics you expect, understand how those metrics are being sent to the dashboards. Then you will have your hypothesis.

Write down the problem and solutions

Root causing a problem can be difficult and tricky because there could be lots of possibilities. Even if you are convinced that a root cause has been found, applying a proposed solution may still not solve the problem.

In this case, the best way to move forward is to write down the problem and the solutions you have tried. Writing is a magical process because it allows you to think critically and logically. When you go back and read what you have just written, you may realize that you have identified the wrong root cause, or the solutions you just tried would have never worked in the first place.

Use Occam’s Razor

When root causing the problems, try to propose multiple hypotheses. For example, when you are observing a high error rate in the service metrics, you could have several hypotheses:

  1. Those errors are caused by clients sending invalid requests.
  2. Those errors are caused by service throttling their requests.
  3. The service is emitting the wrong metrics.
  4. The dashboard is broken and displays the wrong data.

Note that those hypotheses are ranked by the likelihood, guided by Occam’s razor — the simplest explanation is usually the correct one, unless being proven otherwise.

Use 50/50 rule

Use at least 50% of the time to understand the problem, propose a hypothesis, research the solutions, write them down, pick the best/most likely solution. I call this “preparation step”. Then use the rest of the time to implement the solution (aka implementation step).

If you spend less than 50% on the preparation step, you think you’re saving the total problem-solving time, but unfortunately, it’s not likely to be the case. The time you save by not doing some work in the “preparation step” will cost you more in the “implementation step”. If you don’t fully understand the problem, you will implement the wrong solution, which may cause more problems than it solves.

Ask for help, appropriately

Don’t be afraid of asking for help. Most of the problems you encounter are not new and most of the solutions are already there. Sometimes you just need a simple pointer to correctly diagnose the problem and reuse an existing solution.

When asking for help, do it appropriately. It means that while your colleagues are there to help you, their time is valuable. The way of showing your respect to their time is to do as much work in the “preparation step” as possible. Writing down your thoughts on the problems and potential solutions is a concrete way to show that. When someone reaches out to me for help, I would typically ask for a short document describing the problems and the solutions they have tried. This encourages them to write down the problem and make them go through a critical thinking process.

--

--

Jeremy Song
Stochastic Stories

I am currently a Principal Software Development Engineer at Amazon. All opinions are my own.