How Fast Can Google GO Really Go?
I started programming in the 70’s as a teenager on both the good old TRS 80 with a cassette drive and the BASIC language and on an IBM mainframe with RPGII and COBOL. The TRS 80 was making an attempt at games, text based ones, but they were fun. For the real world we were using computers to batch process data and the languages were optimized for it. RPG (not to be confused with Role Playing Game) was super powerful but super odd, as it was totally optimized towards the 80 column punch card with as little effort as possible, while COBOL and BASIC were free format. Fortunately the number of installations of RPG and BASIC is relatively small these days, but what about COBOL and can it be replaced?
According to reports, COBOL is still 70% of the installed software in the world and processes 200 times more transactions than Google and YouTube search queries. A big problem is no one is learning COBOL anymore, and as a language, COBOL just isn’t anything like what is taught anymore, although it is very easy to learn. With that preamble, you might wonder where I’m going with this. I’d long thought that Python would make a pretty worthy successor, and I believe that, but in the last few years I’ve been doing some small projects using Google’s relatively new Go language. I didn’t want to do another language comparison really, but instead a performance comparison on something COBOL excels at, raw batch processing.
I pulled down a geographic data file that was a series of number strings in a CSV file as a test case. The file was about 190MB and was comprised of 2,173,762 records. This seemed like a good test case that would give us some timely results.
The other thing I wanted to do was use an older and under-powered computer, so we went with a computer with just 640MB of RAM and running Windows XP.
For the COBOL compiler I used my own product, KOBOL, the fastest COBOL out there from our tests a few years ago. One of the most expensive verbs in the COBOL language is the “UNSTRING” command, which will parse a string based on delimiter(s) into one or more variables. It’s very convenient other than the requirement of having all variables defined ahead of time, so you need to know the data you’re working with. This is not a restriction in Go but I wanted to define both apps in as similar a way as possible.
To keep the test simple, all we are going to do is read the file and parse it on the comma delimiter into the requisite number of variables. With the low amount of RAM and the large number of records, I had assumed the results should be pretty similar as we were going to be doing a lot of reading from disk. I was rather surprised at the results, though. We did a reboot between each run to make sure the system was in a similar state each time and we ran the test multiple times to confirm results, the aggregate scores will be reflected here.
To add a bit of a twist, the Go language has multiple ways to read a file. I kept it simple and restricted myself to READ and READALL. The functional difference is that READALL will slurp the entire file into memory first and then you perform your operations on it there. Here are the numbers, and then we’ll review the results:
COBOL 2 minutes 28.27 seconds
Go READ 40.55 seconds
Go READALL 6 minutes 17.5 seconds
For the COBOL app, I even tested removing all the processing on the records and just left the reads, and it only resulted in a change of a few seconds, which really surprised me. Apparently my KOBOL compiler was even more efficient than I thought. I then ran the GO programs on a 64-bit Windows 7 machine with 4gb of RAM to see how they’d do. I didn’t do this with KOBOL because we don’t have a 64bit version.
Go READ 23.10 seconds
Go READALL 23.61 seconds
Look at that difference for READALL. It still isn’t as fast as READ but the change between RAM and architecture made an enormous difference. While it was less than 2x improvement on READ it was closer to 9x for READALL. What this tells me though is that the performance of READALL for any serious batch processing is a very bad idea because the results will change so dramatically between systems. It might be a great idea for small files, but I would stay away from it for batch processing.
I like COBOL quite a bit for doing what it was intended to do, but Go is modern, clean, portable, compiled, fast and very similar to what is being taught these days. Go is light years ahead of working in Java, which is in no way suited to a COBOL replacement. The more I work with Go, the more I like it as a replacement for almost every other language out there, from the Web to servers chunking through millions of healthcare records.
So to answer the question in the title, Go can go really fast. Faster than the COBOL workhorse that is running the world’s computers currently.