Go Sort Faster

sagivo
sagivo
Published in
3 min readFeb 6, 2015

Sorting is one of the most basic operations we do on arrays.
Each computer-science grad knows that the running time of sorting can be dramatically changed based on the algorithm you choose.

This post will discuss the different ways to sort arrays and why it’s important at all.

TL;DR

Computer science algorithms combined with Go’s built in parallelism can create magic and make your sorting X2 faster!

The Default Way

In order to sort in Go, you need to import the sort package. Let’s have a look at this code:

By looking into go’s source code you can see that go uses QuickSort as the sorting algorithm. Quicksort is nice, it usually runs at O(NlgN) — the best sorting time for every comparable sorting algorithm.

The problem is that Quicksort depends on the array being randomly shuffled and the order of the elements can dramatically affect the running time. In a worse case scenario, Quicksort takes O(N²) to run. This can be really bad for very large arrays.

The Merge Sort Way

We can try to sort our arrays in another way by using MergeSort algorithm. This will guarantee us O(NlgN) running time using divide-and-conquer paradigm:

  • Recursively split the array to 2 (merge_sort) and sort them.
  • Merge the 2 sorted parts (merge).

here is the code to do it in Go:

https://gist.github.com/sagivo/2983f18ffb811b2fec8b

Show Me the $$$!

Let’s let the numbers to speak. I decided to compare the two functions with an array of 1M unique numbers in random order. The numbers were in a file that loaded by the program on the beginning and the loading time was not part of the measurements. In order to measure the results, I used Go’s testing and benchmarking library.

First Try

I started by comparing quicksort and mergesort:

$ go test -bench=. -benchtime 10s compare_test.go
testing: warning: no tests to run
PASS
BenchmarkMergesort 5 2285339714 ns/op
BenchmarkQuicksort 10 1661941827 ns/op
ok command-line-arguments 42.501s

As you can see — the built in sorting runs 70% faster than our mergesort.

Second Try

Mergesort first splita the array and then sorts each part aside, only then merges the 2 sorted slices.

That sounds like a great parallel case and this is Mergesot’s chance to really shine. In order to do that we need to use Go routines and channels (in my next post i’ll explain more about them). So instead of our mergeSort function, I created the mergeSortAsync function. The merge function can stay the same:

My second attempt was even more crazy:

$ go test -bench=. -benchtime 10s compare_test.go
testing: warning: no tests to run
PASS
BenchmarkMergesort 5 2333488860 ns/op
BenchmarkMergesortAsync 1 66508145528 ns/op
BenchmarkQuicksort 10 1620829079 ns/op
ok command-line-arguments 105.063s

The Async mergesort runs x30 slower than the sync one.

Third Try

Using go-routines and channels creates a lot of overhead for the system and in our case we actually created new go routine for each split of array (the first iteration will create 1M(!) of them). We need to do better than that. Since there’s a tradeoff (over-head vs. parallel) we can fine-tune the current number and fire go routine only if the array’s size is above some limit:

And the results are:

$ go test -bench=. -benchtime 10s compare_test.go
testing: warning: no tests to run
PASS
BenchmarkMergesort 5 2331591468 ns/op
BenchmarkMergesortAsync 10 1776488616 ns/op
BenchmarkQuicksort 10 1597424781 ns/op
ok command-line-arguments 62.164s

Nice. The Async version is faster than the sync one but still neck-to-neck with the built in version.

Last Try

Looking in Go’s docs revealed something interesting. We need to tell the benchmarking tool how many CPUs to use in the testing. By adding extra flag -cpu 8 we got:

$ go test -bench=. -cpu 8 -benchtime 10s compare_test.go
testing: warning: no tests to run
PASS
BenchmarkMergesort-8 5 2282453034 ns/op
BenchmarkMergesortAsync-8 20 841382601 ns/op
BenchmarkQuicksort-8 10 1606719735 ns/op
ok command-line-arguments 62.018s

BINGO! the Async mersort runs faster than the sync version and x2 faster than the built in sort!

This number can be even higher if you tweak the number that switches between the sync and async functions. It will also be higher in case the initial order of the array is ordered.

The full source code can be found at my Github.

--

--