Sorting Algorithms that don’t Hate You

Erik Corry
6 min readJan 11, 2023

--

(This is a sort of sequel to Hash Maps that don’t Hate You.)

Many years ago on the V8 JavaScript project I made a lot of people very unhappy.

You can still see the bug report for the notorious issue 90. Many people were upset to find that the sorting algorithm in V8, used by Chrome and node.js, was not stable. I, and others on the V8 project, refused to replace it with a stable algorithm. This was called absolutely unacceptable, unreasonable, etc.

Not a stable sorting algorithm

Let’s take a step back, in case anyone has forgotten what it means for a sorting algorithm to be unstable. If there is a strict order for every element in an array, then sorting the array is very predictable. There’s just one order the array can end up with. Consider sorting this list of country names in Toit.

So far, so straightforward. The default comparison function for strings produces the order we are used to from dictionaries and libraries, so this is an alphabetic sort.

But what if we have a comparison function where some array elements compare equal? For example, lets sort so that the short names come before the long names.

Here we added a comparison block that compares by size instead of dictionary order. In this case the sort returned “Germany” before “Denmark”, but could it just as easily have returned “Denmark” before “Germany”? After all, they both have 7 letters, so they are equal for the purposes of sorting.

In Toit, sorting is stable, and this means the elements that test equal will always stay in the order they had before sorting. So because the input array had “Germany” before “Denmark”, that’s also the order they will be after sorting for size. After I left the V8 project, the Munich-based developers of V8 took pity on the poor users and also fixed V8’s sort algorithm to be stable. The notorious issue 90 was finally closed with a fix in 2018, just under a decade after it was opened. And in 2019 the new ECMAScript standard mandated a stable sort implementation.

https://twitter.com/mathias/status/1036626116654637057

So why was I so reluctant to make the sorting algorithm stable? The standard never prohibited a stable algorithm. The explanation lies in this chart from Wikipedia.

https://en.wikipedia.org/wiki/Sorting_algorithm

The ideal sorting algorithm is the one that is all-green. As you can see, there’s only one algorithm that has this property, the Block Sort. At the time I didn’t know about Block Sort (the Wikipedia page was started in 2014), and now that I know about it, I haven’t been able understand it! Apparently an implementation is 1300 lines in Java. The performance isn’t great, either, so I’m not sure if we would have considered something so complex in 2008.

Clear as mud. See also https://www.youtube.com/watch?v=NjcSyD7p660

So given that I knew of no all-green algorithm, I had to compromise on one of the columns. I had no problem with log n extra space (yellow in the space column). We had selected Quicksort, which is a very fast algorithm. If you avoid some pitfalls around pivot selection, you can avoid the n-squared worst-case running time in practice (we did have a few bug reports, which we fixed by making pivot selection more complex). However, Quicksort is not stable, hence the complaints.

The obvious alternative is some version of Merge Sort. (Recently, Timsort has become popular, also a variation of Merge Sort. However, Timsort was not invented yet in 2008. Correction: Timsort is from 2002.) Merge Sort has one big disadvantage: It requires a temporary buffer. This doesn’t affect the big-O space requirement. After all, you can’t sort an array of length n without having an array of length n, so you can’t do better than O(n) space. But it feels wrong to temporarily require a second array of length n, just to sort an array of length n. This would increase the peak memory use, and limit the maximum size of arrays that could be sorted in a given amount of memory.

There are many variants of Merge Sort that attempt to deal with this issue. Here I will present some of the simpler ones. After all, Toit is optimized for small devices so we don’t want the code size to explode.

The ‘textbook’ Merge Sort looks like the below animation. It uses a second array the same size as the original array, and each stage copies the data into the other array. In the first stage it sorts subarrays of length 2. Then it sorts subarrays of length 4, etc, until we are done.

Classic Merge Sort

However, we can do better than that. These days, blocks of memory can be copied very cheaply, so it doesn’t cost much to copy the left half into a temporary buffer and then merge into the same array that the right half is still occupying. This way we reduce the overhead to half the array size. Certainly a space optimization worth having. In the following animation we show that a half-sized buffer is enough. (We could do the operations in a different order, which would not change the space used.)

Merge Sort using a temporary buffer that is only half size

Once I did this version, I noticed that it’s only right at the end that we need the whole of the half-sized temporary buffer. So I realized we can do even better, if we are willing to lose a tiny bit of performance. By doing a somewhat asymmetric merge at the top level we can reduce the overhead to only a quarter of the input buffer size.

The way this works is that instead of the final 50–50 merge we do a 25–50 merge, followed by a 25–75 merge. This has no effect on the big-O time complexity, and it makes very little difference to the run time.

By merging asymmetrically at the top level we reduce the temporary buffer to quarter size

In practice the Toit implementation does the same operations in a different order. It’s not many lines of code — quite a lot fewer than it took to make these animations. If you click on that link, you’ll see that we switch to insertion sort when the sub-array to be sorted goes below a certain size. For the animations, that size limit is 2, the initial pair swapping. A larger cut-off for insertion sort is a clear win, and almost universal in sorting implementations.

Looking back, it’s ironic that I was so protective of memory use in V8, which typically runs on devices with multiple gigabytes of RAM. Yet now that I’m writing for the ESP32, which only has a few 100 kilobytes, I’m willing to spend a bit of RAM to get a more programmer-friendly algorithm. I think I’m going old and soft!

--

--