Big-O really REALLY matters

A response to the recent article ‘Fuck You Startup World’

A response to the recent Medium article by user shem:

Shem’s recent article is an enjoyable (if profane) lashing out at startup culture, and silicon valley culture in general. I mostly took it as intended, tongue-in-cheek and not meant to be taken too seriously. But one section irked me, specifically in reference to the kinds of questions we face in software engineering job interviews:

I never had to shift a bit in a C array in my life! […] I don’t fucking care about the complexity of this code block because I can afford another EC2 instance!

Now, I find the white-boarding, coding gauntlet inherent in my industry of choice as excruciating as the next introverted and insecure hacker (me personally to a ‘T’ I wholly admit), but nevertheless, both statements above frustrate me. As to the first statement, I have bitten off quite a bit of bit-shifting in C and C++ and find it to be a pretty fun area of low-level processing, but I won’t bother with that further here.

The implication of the second statement is that complexity inefficiencies don’t matter in the modern cloud-computing era since they are easily compensated for by mere cluster-scaling, which itself is pretty easy these days now that cloud-processing tools offer slick interfaces for cluster management. The statement implies that complexity inefficiencies are harmless because they can be overcome quite easily by adding another instance. Is this true?

Well, right of the bat, perhaps it is true! If the complexity inefficiency to be overcome is an additive inefficiency, then an additive increase to our cluster will compensate for it and we can continue on our merry way. Adding a constant number of nodes to our cluster qualifies here. If our initial cluster consists of 10 nodes, then adding some constant number of new nodes, say one more node as per the quote above, will yield a cluster of 11 and we have overcome the inefficiency of our code. I think there’s an artisanal fusion microbrewery/pure-origin-cafe around the corner from our office. Let’s allocate a new node and take a long lunch.

But most Big-O errors in code are not additive-time (or space) errors! What if we have chosen a data structure or algorithm which is inefficient in a multiplicative fashion, that is to say it isn’t “one unit” or even “100 units” inefficient, but rather it is “2X inefficient”? To overcome this inefficiency we must multiply our cluster size by some constant, saying scaling our initial 10-node cluster by 2X, yielding a new 20-node cluster. That’s more problematic that just adding a small constant number of nodes.

Oh, but we’re just getting started. A common inefficiency that crops up in poorly designed software is not multiplicative errors, but quadratic errors. To compensate for this inefficiency, we must scale our 10-node cluster by 10², yielding 100 nodes. At this point, someone up the chain is likely to start questioning our approach to these kinds of problems. We should probably make a slide-deck that explains our methodology, just in case we are called in to explain ourselves (the graphics must be flat-designed though! We don’t want to look like we’re from 2002 do we?). I think the wifi is good at that brewery/cafe. Let’s go make our slides there.

Is that it? Are we done? Hell fucking no we’re not done. What if, due to our lack of concern of computational complexity, we made the mistake of choosing an inefficiency that is exponential. To work around this problem by allocating more nodes, we must now take our 10-node-cluster to, oh say, 2¹⁰ nodes. Our 10 node cluster just went to 1024 nodes in a single allocation to overcome the complexity problems inherent in our code.

Well shit. I think I just saw a head rolling across the bespoke, reclaimed wine-barrel floor of our startup’s office. Oh, it’s our head.

I wonder if we’ll have to face any algorithmic complexity questions in our next job interview, which by the look of things will be in the very near future.

Yeah, Big-O analysis matters.