Blame game

Leon Fayer
homo technologicus’ asylum
3 min readJul 31, 2012

--

Every technologist has a technology preference. PHP vs Perl, MySQL vs PostgreSQL, Chef vs Puppet, Microsoft vs … well, pretty much everyone else. However, good technologists know the advantages and disadvantages of different tools to make the argument pro or against one of them. Others just blame technology for their own lack of knowledge.

I am a Perl advocate. I’ve used it for a good part of the last 2 decades, I’ve seen it evolve, I know its flaws and advantages. I also have my opinions on PHP. With that said, I fully understand the benefits PHP provides, so my arguments with true PHP advocates are usually more on an academic/philosophical side, or they revolve around a particular problem and different ways of solving it with a technology of choice. Most of the time those conversations are entertaining, and once in a while educational. However, if you proclaim that “Perl sucks!”, you better make damn sure that you know how Perl works before you make that statement. If I hear you say “Perl doesn’t perform as well as PHP” and I see n+1 problems in your code, I will beat you with a stick completely disregard your opinion.

Recently, my colleague and I talked to a group of people who were looking for help with PostgreSQL performance. They benchmarked 5 million rows inserting for over 20 hours. They also claimed that they were able to achieve the same 5 million INSERTs with SQL Server in minutes. Needless to say, they were extremely dissatisfied with the product. We heard the usual “in XX years of working with databases we’ve never….”, and other not-so-flattering characteristics of PostgreSQL. So, after a few initial answers to our questions, each followed by “it is unacceptable for 5 million INSERTs”, we started looking at the actual code. We looked at the primary function invoked for each row emulating UPSERT (not INSERT mind you), since PostgreSQL doesn’t support UPSERTs natively. The function performed about 10 operations per row, out of which about a third were INSERTs, a third UPDATEs, and a third SELECTs with full table scan nonetheless. So instead of 5 million INSERTs, as claimed, the process did about 50 million operations. Even then, the comparison with SQL Server was still on the table. However, when asked if the same function was used for SQL Server, the answer was “no”. So we asked to run COPY to INSERT 5 million records directly into PosgreSQL, to compare apples to apples. It took 42 seconds. So, PostgreSQL is capable of inserting 5 million rows in 42 seconds directly, and does it in over 20 hours through internally developed function. You see where the problem is? Clearly, it’s choosing PostgreSQL as your database…

Now, don’t get me wrong, PostgreSQL may not be the best tool for the job in this case, because of the lack of native support for UPSERT functionality. But it is certainly not clear in this use-case, and I haven’t seen or heard any evidence to the contrary. Even at the first glance .conf file was default (which is a no-no for production), the function could be significantly optimized by removing unnecessary operations, and with a few tweaks to the schema, optimized even more. The list goes on. But you can’t make an assessment on viability of technology if you’re not using it correctly. The point is — don’t diss a technology because you don’t know how to use it. Hatred for technology is healthy, but don’t formulate an opinion based on your comfort level (or lack thereof) with a particular technology. Similarly, don’t become overzealous for a technology that you’ve only read about on the Internet and have never successfully operated in production (I am looking at you MongoDB casual fans). Be familiar with all the tools in order to be able to choose a right one when the time comes, and hate them all without discrimination.

--

--

Leon Fayer
homo technologicus’ asylum

Technologist. Cynic. Of the opinion that nothing really works until it works for at least a million of users.