The Tyranny of the Middling Solution

Michelle Brush
Jul 30, 2017 · 5 min read

I never really got Goldilocks. Was it a lesson saying you should settle so you can eat and sleep before the bears come? Was the lesson to not steal from bears? Most people take away the lesson against seeking perfection.

“Perfect is the enemy of good.” -people paraphrasing Voltaire

I was working at an embedded systems company when my manager asked that I speed up an underperforming legacy algorithm. He set a performance target and asked for a plan to meet it. Since the algorithm’s author had long left the company and no one had profiled the things in years, I proposed essentially the following plan:

  1. profile it
  2. fix the obvious bottlenecks
  3. profile it
  4. if no remaining obvious bottlenecks exist, alter algorithm
  5. otherwise, GOTO 2

I estimated we’d need a month of profiling and fixing bottlenecks before we’d know if we’d need to revisit the essence of the algorithm. If we did have to revisit it, I estimated 3 months of research and development would be required.

My manager didn’t like my plan. He thought the profiling was a waste of time because he felt we knew what the problem was. We knew where I/O-bound reading data needed to inform the algorithm. He wanted to just cache a subset of that data in memory and move on. He felt the cache work would take a couple of months, beating my plan. I disagreed.

For one, I felt he was wrong about which subset of data to cache, and I need to profile the algorithm to show him this was the case. Second, the way our users used the algorithm, I didn’t expect the cache to have a high hit rate. I held my ground.

The great thing about this manager is even when we disagreed technically, he still gave me complete autonomy. He didn’t take a command and control approach and order me to follow his direction. That isn’t to say he bought into my plan. He just let me run with it. To make himself feel better, he would stop by my office more frequently to hear how I was progressing. He wanted to make sure he could steer me away from my plan if there was evidence he was right, but he didn’t push me.

I was right. Profiling showed we had tons of performance problems unrelated to reading the data. It was clear the I/O in question was a smaller percentage of our performance than we had assumed. We would not get us where we needed to be with caching.

Why was the cache idea so appealing? I think partly because it landed in the middle of two options.

A) Profile & tune: 1 month

B) Cache: 2 months

C) Rewrite: 3 months

It was so easy as to be suspicious of option A. It was too cheap. My manager didn’t believe we could buy any significant performance from simple low-level tuning of the code. However, we were able to shave off 30% of the algorithm running time. (Donald Knuth would have been proud as the problem code was a collection of poor attempts at premature optimization that had actually made things worse.)

Option C, rewriting the algorithm, seemed risky and expensive. No one wanted to do it. As a result, caching seemed “just right.” But data showed it wasn’t. We did end up rewriting the algorithm after all.

I’ve moved away from embedded systems and now work on large distributed ones. But the goldilocks situations still present themselves. For example, recently, my team realized the YAML-based configuration in our system was unwieldy and prone to mistakes. This meant that a small number of experts on the team were the only ones trusted to change anything which was a bad situation. The cheap fix was to create configuration templates that people would copy and paste with comments of what to touch and what not to touch. The medium cost fix was to refactor the configuration such that when performing operations people would interact with a smaller subset of the configuration. The right fix was revisit all the configuration, determine what was essential, remove what wasn’t, and build domain-driven services that exposed the configuration more intuitively.

We went with the refactoring approach thinking it was good enough to get us out of the mess. We didn’t get out of the mess. People still fat-fingered things in the smaller subset we exposed and triggered incidents in the system. We wished we had argued for the expensive option. The problem is, it felt hard to justify now. We had spent some development effort on the refactoring, and the result of that effort was a reduction in the number of incidents. From an ROI perspective, you could argue it was a success. You could argue it was good enough, not perfect.

Like the wines in the middle of the wine list, the middling solution is tempting. No one will be accused of skimping on the problem, and the risk of failure is smaller. Why not take the middle option?

There’s a tyranny in the middling solution. It’s just good enough to work that it will be hard to justify it’s eventual replacement.

The good news is that we’ve decided not to stick with good enough. We are embarking on the domain-driven services work now. This is again, due to having leadership that is willing to trust the team when we say we need to do something different. However, going back, I wished we’d taken a different approach initially. I wish we had created similar plan to the algorithm performance work I described. That was, I wish that instead of picking the middle option first, we had done the cheap one to buy us just enough time to do the expensive one.

You may ask, “Why not just to do the right one upfront?” We were in operations hell. Part of the appeal of the middling solution is its lower cost. When the team is feeling pain whether from operations or market pressure, faster always feels better. People don’t feel they have time to do it right. We can say in hindsight, “Well, did they have time to do it twice?” but that ignores the emotional component to the decision-making.

I’m arguing you may have to do it twice. Once, as cheap as you can to buy some time and then twice to do it right. Goldilocks didn’t go straight to the right bed, the right porridge. Even she had to try a couple of options first.

Michelle Brush

Written by

math geek turned computer geek. leader of teams and wrangler of data.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade