“Premature optimization is the root of all evil” is the root of evil

Let’s face it, the quote from the title has been used to advocate bad programming far too often. Sometimes it’s complemented by the numbers from the full quote like they are the permanent truth or for the very least well-measured observation:

We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil. Yet we should not pass up our opportunities in that critical 3%.

Sometimes it’s just backed by the authority of Donald Knuth, the original author of the quote. However Knuth himself attributes the “evil quote” to Hoare and Hoare attributes it back to the Knuth, so it’s more of a nobody’s child.

The thing is, it was first printed 42 years ago in 1974 in an article devoted to justify use of “goto” statement in structured languages. In terms of black and white, it was pro optimization and not contra. And if you read TAOCP, you know that Donald Knuth himself puts algorithms and the machine far before paradigm is it structural, or functional, or any other type of programming. He publishes code examples in MMIX (formerly MIX) machine code instead of high-level languages because this is what really matters. Algorithm and its machine implementation were the core of computer programming 42 years ago, and there is no good reason for this to change.

However, it is common knowledge now that you should build things to work and only then optimize them to work fast. For me it sounds like: “let’s build a house of straw and beeswax and call the fire brigade when it catches fire”. It’s absurd for every other engineering discipline apart from software engineering. We are here to anticipate and prevent the problems, and not to spawn them ignorantly because of 40 years of self-deception.

In 1974 optimization indeed meant sacrificing code clarity for mere percents of performance improvement. Sometimes you manage to sacrifice less and gain more, which is good. In the very same article from which the “evil quote” is taken Knuth published actual results for a case of such optimizations:

The improvement in speed from Example 2 to Example 2a is only about 12%, and many people would pronounce that insignificant. The conventional wisdom shared by many of today’s software engineers calls for ignoring efficiency in the small; but I believe this is simply an overreaction to the abuses they see being practiced by penny-wise- and-pound-foolish programmers, who can’t debug or maintain their “optimized” programs. In established engineering disciplines a 12% improvement, easily obtained, is never considered marginal; and I believe the same viewpoint should prevail in software engineering. Of course I wouldn’t bother making such optimizations on a one-shot job, but when it’s a question of preparing quality programs, I don’t want to restrict myself to tools that deny me such efficiencies.

Nowadays, however, mostly because of years of performance negligence, there are so many low hanging fruits, you don’t really have to go deep into scrambling your code to gain hundreds and thousands of percents of improvement. Here are some anecdotes from my practice for an evidence.

Once we got 10% speedup in the routine by changing `unsigned short` to `size_t` in a loop counter. It happened to be the inner cycle of matrix transformation and apparently, the compiler we used at the time didn’t unroll it solely because of the counter type. Weird, but compilers may mis-optimize things up to that day, that’s why it is a good idea to check their output with disassembly tool at least for the very intense parts of the system. It’s not even that hard, in fact, everyone can read disassembly.

The other time we got 50% speedup by simply caching an operation that was considered cheap and hence uncacheable. Perhaps it was so some 10 or 20 years ago, but nowadays simply putting numbers in cache-friendly order once instead of gathering them from all the RAM grants a huge win because of the difference in cache and RAM reading latency. Machines change, we can only account for these changes by reassuring the facts we know about our system with testing and measuring things that were considered improbable in the past.

We also got 100% of speedup by eliminating some extra calls of a really heavy procedure. It should have been called once, but because of the architectural quirk, it was called at least once more in the same frame. The moral is, you don’t necessarily know what’s going under the hood unless you peek there periodically. Profiler helps not only in looking for bottlenecks, when the whole thing’s on fire but as an investigation tool as well.

And that one time we got 400% speedup by rearranging the image processing pipeline. Images were stored on disk in one of several integer formats, then they were loaded, converted to double precision floating points and then back to the integers, only to do some minor transformations on them. The whole thing was losing most of the performance by simply creating and filling intermediate structures. Alas, as processing pipeline was rather heavy and variative, it wouldn’t be very wise to apply it to every image value because of the cost of dispatch. We thought the way to statically create all the possible combinations and then dispatch it once per image hence enabling cheap per-value processing without any intermediate structures. And the resulting code happened to be just about the size of the original. Well, sometimes it pays to know your way around meta-programming.

But the largest performance boost I ever acquired came from very simple thing. I used default .NET facility for matrix multiplication Matrix.TransformPoints method and it was very slow, so I decided to re-implement it on site. After all, it’s just a vector-matrix multiplication, a simple operation. This re-implementation gave me easily 20000% of improvement! When I peeked under the hood to see how the original operation works, I saw this:

- System.Drawing.Drawing2D.Matrix.TransformPoints:
- System.Drawing.SafeNativeMethods.Gdip.ConvertPointToMemory,
- System.Drawing.SafeNativeMethods.Gdip.ConvertGPPOINTFArrayF:
- System.Drawing.UnsafeNativeMethods.PtrToStructure:
- System.Drawing.Internal.GPPOINTF..ctor (which is empty),
- System.RuntimeType.CreateInstanceSlow:
- System.Runtime.InteropServices.Marshal.PtrToStructure.

This ladies and gentlemen, is exactly how years of performance unawareness look like. The perfect example. It has conversions and marshaling, constructors and creators, and only sometimes deep on the bottom it has the matrix multiplication I was looking for, but it doesn’t even show on the profiler.

My point is, it’s all bad. It is the fear of evil that took us to the point the very basic desktop software is sluggish as hell. At work, I use Visual Studio and Word, and Excel, and Outlook, and it’s all tremendously bad. Sometimes, when Visual Studio goes “busy”, I have to load my trusty Vim just to continue to work. And I’m not even a Vim fan, it’s only good quality for me is that it works predictably. And on the Internet, it’s even worse. The guy wrote a message board engine in assembly and everybody is amazed how fast it is. Well, it’s not. There’s no magic in assembly per se, it’s just all the other implementations are horribly slow. Assembler only brings performance awareness upfront.

So, it is bad. It’s all bad. But it’s good. For me. Because I make money and reputation by making things better. I fear no evil walking through the darkest valley, and it pays well in the end. So for all those who cover own negligence with Donald Knuth’s “evil quote” I have one thing to say.

Thank you.

Ranunculus Root Cross Section. By Sadierath — Own work, CC BY-SA 4.0