Exploiting Your Powerful Cloud Servers with Go Lang’s Concurrency

Provisioning costly machines is not useful until your programming language and programs can utilize it.

Machine power is getting bigger. But really, the line that matters is that big machine power is getting more accessible. Google recently announced high performance machines as part of the Google Compute Engine instances.

While you feast your eyes on that and imagine how much raw power it is able to give you, you should also think of the reality of actually utilizing all of that.

As a business leader, your assumption might be that simply allocating that high-end instance with 200GB of memory and 32 virtual CPUs is going to give you much faster throughput and you, without argument, agree to all the extra cost of provisioning these instances. Except. Well, except that it doesn’t give you the results that you were hoping for. It doesn’t matter that all this computing power is available if your programming languages and programming models don’t take advantage of it.

As an example, here is our problem statement that involves some memory and CPU intensive work that also accesses your storage:

  • Create a large number of images (say 1000 to start with)
  • Make it reasonably big: jpeg { 1024 x 960 }
  • Let it be at a 100% quality in jpeg terms
  • Save the file to disk

A Straightforward Loop

I first wrote this program as a straightforward loop. (The entire code is available in the github project.)

The results of running this were as below:

You can see that my 8 CPU cores is used less than half. Programming languages allow you this much by default. Here on is where it gets difficult. To be able to fully utilize the CPUs that’s already paid for, one could use system resources like threads — but these are often not easy to program and when not done perfectly, they are a major cause of errors and hard to debug/fix issues. Even then, it is really up to how the program is compiled for that machine which will allow full exploitation of the hardware.

Go Lang Concurrency and Sync.WaitGroup

With Go, concurrency is given to you on a platter. I’m going to add a few lines to the same program and:

  • start each function call with ‘go’ which will make it concurrent
  • track each concurrent routine that starts and wait for all to finish with sync’s WaitGroup

And now let’s see what we’ve got with just adding a few lines of clean, easily maintainable code.

In about a 3rd of the time. With pretty much full utilization of all the computing power. Taking some liberty with extrapolating those results fairly inaccurately, if your instances were allocated on actual usage, you would save 66 cents on every dollar otherwise spent. (In reality though, you are unlikely to get as much based on pricing structure and how instances are costed.)

p.s. With current versions of Go (1.5.x), the program automatically uses all cores. With earlier versions, you can get the same result by adding runtime.GOMAXPROCS(runtime.NumCPU()).

On a High-Performance Machine

So then I tried it again on Google Compute instance with 16 vCPU and 104 GB memory. (I tried to allocate a 32vCPU, 208GB instance but GCE didn’t allow me right away. Looks like they are limiting these to only those who request for it.) So I ran both the straight loop version and also the concurrent version.

Straight Loop Results: ~37 seconds. Some improvement (12%) compared to it finishing on my local machine in ~42 seconds.

Concurrent Results: it finished in ~4 seconds. That’s about 70% further improvement in speed compared to the ~14 seconds it took on my machine as a concurrent program, and about 90% faster than it running in a loop in either case.

Clearly, the concurrent version is able to actually utilize the hardware you’ve paid for much better and deliver results much faster. And a language like Go that allows you to get to that easily is well worth looking at.

Edit: I did a talk at Gophercon India about this.