Image for post
Image for post

Sorting algorithms are an awkward combination of fundamentally necessary, and deeply contentious. From new engineers looking to impress in an interview to older engineers searching out a solution to a rapidly scaling database, there are myriad factors to take into consideration. What is the speed of comparison between two objects? What’s the swap time? How large is the database? What type of objects does it contain? Is it already semi-sorted? Do the results need to be stable?

Each of these questions may draw out arguments in favor of one algorithm or another. Is the source data large and complex? Most languages default to the standard, Quick Sort with its O( n log n) time complexity. Is it smaller? Insertion Sort works wonders on those. Mostly sorted? Heck, Bubble Sort might almost work for that. If you want to read/visualize the merits of each, check out this comparison by toptal.com.

One sorting algorithm you won’t find on that site, or almost any other, is Tim Sort. This obscure sort is currently unique to Python, and is used as its default sorting algorithm. Call array.sort in Python, and Tim Sort is what gets executed. In spite of this, it’s rare to find engineers who both know and understand Tim Sort. So: what is it?

Image for post
Image for post
Fig 1: Tim Peters, inventor of Timsort

Origins

Image for post
Image for post
Fig 2: complexity comparison of the various sorting algorithms (courtesy of http://bigocheatsheet.com/)

Why Tim Sort?

In terms of space, Tim Sort is on the worse end of the spectrum, but the space consideration for most sorting algorithms is pretty sparse. O(n) isn’t too rough in most instances; it’s worth noting as a possible deficiency, and the one place where Quick Sort really outshines Tim Sort.

The final item that sorting algorithms are often judged on is stability. Stability is the concept that when sorted, objects of equal value maintain their original order. Now, you might be wondering why we care about that. The items are of equal value — why do we mind how they are ordered?

The simple answer is that stability matters for stacked sorts. That is to say, you first sort based on one criteria, then based on a second one. If you do this in an unstable algorithm, you instantly lose any reliability from your first sort when you run the second one. For reference, Quick Sort is unstable, and Merge Sort is stable.

Tim Sort is also stable, not to mention fast if slightly heavyweight(in comparison to Quick Sort only). While sorting algorithms can (and should) be judged on other considerations, these are the big three.

The Implementation In Three Steps

1. Binary Search

For reference: Binary Search Algorithms

2. Insertion Sort & Merge Sort

Image for post
Image for post
Fig 3: Insertionsort (courtesy of https://www.geeksforgeeks.org/insertion-sort/)

Insertion Sort is a very basic sorting algorithm. It runs through the array, and whenever it encounters an item that is out of order (strictly less/more than the item before it), it moves it into the appropriate position in the already sorted array. Insertion Sort is notorious for working very quickly on already sorted arrays, as well as smaller arrays. In fact, we can see from Fig 2 that Insertion Sort has an impressive best case run time of O(n). Keep in mind moving forward with Tim Sort: best case for Insertion Sort is an already sorted array. It might sound silly, but that will be relevant.

Image for post
Image for post
Fig 4: Merge Sort (courtesy of https://commons.wikimedia.org/wiki/File:Merge_sort_algorithm_diagram.svg)

Merge Sort on the other hand operates by a basic principle: it is exceedingly easy to merge already sorted arrays. So, it splits a starting array in half over and over until it is nothing but single elements. Then it slowly rebuilds the main array by merging those elements back together in sorted order. Because we started from building blocks of size one, it was very easy to build initial sorted arrays. Then, it’s easy to merge them. In the end, we spend O(n log n) time, and (importantly) we do so in a manner that is guaranteed to be stable.

For example implementations see:

Merge Sort: https://www.geeksforgeeks.org/merge-sort/

Insertion Sort: https://www.geeksforgeeks.org/insertion-sort/

3. Implement Tim Sort

First we set a minrun size. What we mean by this is that we want to ensure that all our runs are at least a certain length. Please note that we are not guaranteeing that we will find runs of this size — we’ll get into this later. We’re simply saying that a run must be at least of a certain length.

When we encounter a run, we set it aside. When we find the longest run within a minrun range. We now have a familiar data structure: a small, sorted array. If it is at least minrun in length, then huzzah! We’re good to move on. If it isn’t, we put Insertion Sort into play.

You may remember from above that Insertion Sort is especially efficacious on two types of arrays: small ones, and already sorted ones. What we just made is a small, sorted array. If it isn’t at least minrun in length, we reach ahead and grab enough other elements to complete the run, then use Insertion Sort to push them into our sorted array, quick and easy. Obviously, if a run encounters the end of the array you can let it be a little short.

Once you’ve created all of your runs (i.e. sorted subarrays), you utilize your Merge Sort to join them together. In a best case scenario the entire array is already sorted and Tim Sort is smart enough to know it doesn’t need to do anything else. Other times, it tends to just be extremely efficient. As an added benefit, both Insertion Sort and Merge Sort are stable, so the resulting array is stable.

For those who prefer bullets:

  1. Establish a minrun size that is a power of 2 (usually 32, never more than 64 or your Insertion Sort will lose efficiency)
  2. Find a run in the first minrun of data.
  3. If the run is not at least minrun in length, use Insertion Sort to grab subsequent or prior items and insert them into the run until it is the correct minimum size.
  4. Repeat until the entire array is divided into sorted subsections.
  5. Use the latter half of Merge Sort to join the ordered arrays.

Conclusion

For the curious, you can check out the entire Tim Sort code on github.

Thanks

Sources

Written by

Software engineer and writer who likes weird edge cases and talking about the world.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store