Big O Analysis and the Top Talent Fallacy

How basic analysis can trump over-engineering

A long time ago when I was a junior engineer I worked at a place that claimed to hire top talent. You know the place. Almost everyone had gone to MIT, Carnegie Mellon or had some intimidating subset of the alphabet come after their name. It seemed requisite to get in. I learned a lot while I worked there, but not always what I expected to.

The algorithm I worked on involved a fair amount of statistical analysis. At the heart of one of our data analysis modules was an algorithm that needed to build up a sorted linked list of items. This portion of the algorithm went something like this pseudocode:

val items:List[Node] = List()
while(itemGenerator.hasMoreItems()) {
val item = itemGenerator.nextItem()
val head = list.head
    while(head < item) {
head = head.nextNode
}
    head.nextNode = Node(item, head.nextNode)
}

Quite simply loop through the list until you found the correct place to insert the new value and just do the insert. The algorithm would then analyze this list to give its final output.

The overall algorithm was slow. It took almost half a day to run on only a few hundred thousand data points. This was including distributing the workload to four nodes and doing a map/reduce operation. Much to my surprise, this was just an accepted fact of the nature of “large and complicated” data. Not only that but the top engineers were advocating some seriously heavy solutions to speed up the algorithm. Think data compression algorithms, Approximation algorithms and the ilk. I brought it up in one of our retrospectives that this particular part of the algorithm was an O(n^2) algorithm that was probably a big bottleneck.

Much to my surprise, according to the engineers, I was wrong. This clearly was not an O(n^2) algorithm because we start with an empty list, and we probably don’t always do the worst case of adding a new element to the end of the list on every iteration.

Ok, so maybe I was wrong. I figured as a learning experience it was easy to take a stab at seeing where I had went wrong. So I set out to see where the proof would fail.

This problem can be expressed as looping over a list, doing an O(1) compare for each element and then an O(1) insert at the end. Doing this once is clearly an O(n) algorithm and doing this N times is seemingly O(n^2). But our list starts out very small and grows with each pass of the algorithm. How can we account for that while using big O?

Well let’s just think of how many comparisons we need to do to insert an element while maintaining sorted order in the worst case. Worst case we’ll need to examine every element. At first, we simply insert the first element into an empty list. Then we do one comparison with the first element. Then the number of comparisons grows for each element we add to the list. How can we express this mathematically in a form we can analyze?

We just have a simple arithmetic sequence at this point. This can be expressed fairly simply with summation:

This still doesn’t give us something useful to do big O analysis on. But oftentimes in mathematics, its just a matter of perspective. Let’s rewrite the above in reverse order and compare it to what we originally wrote:

Now something interesting becomes apparent. What if we were to add the first two terms of each summation to each other? And then the next two? In the first case we have n+1, for the second term we have 2+(n-1) which just so happens to equal n+1. It turns out of we just add each term of the two sums, we wind up with (n+1) for each term. Thus we wind up with:

Now what happens when we do the same thing over and over n times? Easy, that’s just multiplication:

Now we can just divide by two to get a succinct formula for 1+2+3+…+n:

Now let’s be generous and assume the average case where we only have to insert halfway into the list. We’ll divide by two and do the big O on the resulting value:

So I wasn’t wrong. We arrive at this algorithm being O(n^2) quite easily with some very simple algebra. When I took the linked list out and replaced it with a more appropriate data structure, the program sped up almost 100 percent.

Once this was presented to the other engineers, of course the algorithm was changed right away. Later on I was able to chip away at other similar inefficiencies leading to a drastic speedup, again, without any crazy algorithms. Just old fashioned bottleneck analysis.

It turns out the algorithm was pretty fast to begin with. It was just poorly executed by the same top talent who thought further algorithmic complexity was the solution instead of basic analysis.

Lessons learned

Despite there being some very smart people on the team, as evidenced by their degrees and titles, the extreme over-engineering and lack of execution was astounding. Moreover the willingness to brush aside someone’s valid concerns without any real thought into the problem at hand was also disconcerting. Sadly, I feel like this is a general problem in the tech industry, one I’ve had to deal with and one I’ve seen others have to deal with.

I think being humble is a key to being truly successful at the level of a leader. Its always important to realize that no matter how smart or talented you are, you’re going to miss things. You’re going to get things wrong. Sometimes very wrong. That should be perfectly ok. Having other people there, regardless of their experience level or depth of knowledge who can catch your mistakes makes you stronger as an individual and as a leader. Not only that but it makes the person who saw your errors that much stronger and wiser as well. Even if you wind up being right, it can be an extremely valuable lesson for both parties.

I think a lot of people in that position and level tend to get an opinion of infallibility about themselves. Its easy to get overly cocky about your abilities. Especially if your opinion is validated because you’re “top talent”.

Top talent is a lie that managers tell themselves to sleep better at night. Mathematically, you can’t hire “top talent” consistently by virtue of it being an extremely small subset of the population. Furthermore, what qualifies one as “top talent” is an inherently immeasurable metric and can vary from person to person. If you apply arbitrary metrics for what “top talent” means — prestigious degree — PhD — you’re potentially setting up yourself for disappointment. Instead focus on metrics that really matter — positive results.