Breaking Down MergeSort

And Understanding O(N log N) Time Complexity

Sergey Piterman
Outco
12 min readNov 30, 2020

--

Some of the first algorithms you’re taught when taking an introductory computer science course, or when learning how to pass technical interviews, are sorting algorithms. There are a few reasons for this.

First, they are very intuitive to understand. Because we use sorting in our everyday lives to organize real things, this makes the problem setup trivial to explain: Your input is a list of unsorted things, and your job is to sort them.

In their most basic form, you are simply taking an unordered list of numbers and putting them into ascending order.

If you haven’t tried writing your own sorting algorithm, I recommend you pause reading this article now and just give it a go. Think about how you might go about it intuitively and systematically with something like a deck of cards, and then try to translate that into code. Odds are you’ll actually end up accidentally re-implementing a previously discovered algorithm, which is a good exercise in and of itself.

Second, you can also extrapolate the concept of sorting further to show how you might apply a sorting algorithm practically in a real-world scenario. For example, you may want to order a list of objects (in this example fruits) by a given property, like their price, or their weight:

You may even want to implement your sorting function in such a way that it actually allows you to set which property you want to sort by and whether the order is ascending or descending. This starts to get into the notion of higher-order functions, which is a bit outside of the scope of this article, but it’s worth looking into the documentation of how javascript implements its native array sorting method:

And finally, sorting is an interesting topic because there are so many different ways to achieve the same outcome. Typically when you see an algorithm problem there are only a handful of solutions, most of which are either clearly superior to others, or you have several solutions that are roughly equivalent to one another.

With sorting you have a lot of very different ways of arriving at a solution. And what’s more, while there is a large variety of solutions, there isn’t a clear “best” sorting algorithm. You really are deciding between what tradeoffs you want based on what guarantees your system needs to make and what you know about the data you need to be sorted. For example, if you are dealing with a small array size or an array that is mostly sorted except for a few items, usually Insertion Sort or Bubble Sort will be faster than most other algorithms.

But out of all of the sorting algorithms, MergeSort is probably my favorite sorting algorithm for a number of reasons.

The first is that it’s very intuitive to understand and explain to others. In a sentence:

“You keep dividing the array into halves until you’re left with subarrays with one element, then you merge those subarrays back up together in sorted order.”

Visually this is what the breakdown phase looks like:

And then you just merge the arrays back together in sorted order like so:

The second thing I really like about MergeSort is that it’s a stable sort

This means that if there’s ever a tie between two elements, the initial relative ordering is ALWAYS preserved.

So what does that mean from a practical standpoint?

For example, say you were building an app for a school and you needed to do something with student data. That data might just be a simple list of students, where each entry has a name, and then what year they were born and some other fields.

If you wanted to sort that list of students by birth year, and then within each year, have an alphabetized list of all the students, you could use MergeSort.

First, you would just sort in alphabetical order by name, and then you would sort it a second time by birth year, and that would give you the result you’re looking for.

Other unstable sorting algorithms like QuickSort or HeapSort destroy any previous sorts applied to your list, so that limits your flexibility when you want several sorts to be applied in sequence.

Personally, I’ve come across the need for this kind of functionality occasionally when using spreadsheets, specifically Airtable, when I’ve wanted to organize my tables using several fields at a time:

The last thing I like about MergeSort is that it’s efficient. Its worst-case time complexity is O(N log N), which is as efficient as you can get for general-purpose sorting algorithms.

While there are certain optimizations you can make if you know you have a limited set of things you need to sort (like characters in a string) or a limited range of values in your data set, both those scenarios require some prior knowledge about the data you are sorting.

If you know nothing about your data ahead of time, then O(N log N) is as good as it gets.

And while Quicksort may perform slightly better in the average case, and Heapsort uses less auxiliary memory, the combination of efficiency, stability, and simplicity makes MergeSort my favorite sorting algorithm.

How exactly should we think about O(N log N) time complexity?

Other time complexities like constant, linear, or even quadratic are somewhat easier to understand intuitively. That is, it’s easier to see how much the amount of work required scales in relation to the input size.

But for a long time, I didn’t have a great way of seeing how that worked for O(N log N). Like what does it actually mean to multiply N and log N?

Understanding what this means is important because you’ll see variations of it outside of the context of sorting, usually in binary tree algorithms or binary heap algorithms.

At a basic level, it falls somewhere in between linear and quadratic time complexities, but I feel that this description misses a lot of the nuance and doesn’t give you a feel for the algorithm. So to understand it a little better, I’m going to break down the time complexities of its component parts and then show how they work together.

So first, we’ll look at how O(log N) compares to O(N), and then we’ll see what happens when you multiply them together.

O(log N) v O(N)

Suppose you had a dictionary. And I don’t mean the dictionary data structure, also known as a hash map. I’m talking about a physical book with a list of alphabetized English words, for those of you who are old enough to remember those:

Now my question to you is “what’s the most efficient way to look up a word?

One way you might go about this is just start at the first page, and start flipping through until you get to the word you’re looking for. This would be equivalent to an O(N) linear time search.

Linear time search is probably one of the most intuitive time complexities because it’s naturally how we think about the world: proportionally. If traveling 100 miles takes around 2 hours, traveling 200 miles should take around 4 hours.

The only real caveat about linear time complexity is that you can generally ignore the number of operations performed on a particular element in your data set.

For example, say you have a function that takes in a list of numbers and prints each one to the console. That will have O(N) complexity:

Now say you have another function that first adds 10 to each number and then multiplies it by 3 before printing it:

You might think the complexity of this function is O(3N) since we’re performing 3 operations per element (adding, multiplying, and printing). But that’s a common misconception. When performing this kind of analysis we drop the constant in the front since all we care about is how the amount of work scales as you add more elements, which in this case just depends on how many elements are in the array.

Now, a linear search might work okay in some systems if you have no other option. Computers are relatively fast these days so it’s possible to scan through a lot of data very quickly, especially if you can parallelize things.

But issues tend to arise if you start growing your users, bandwidth, memory usage, or the number of requests to a server or database. It can be wasteful to constantly be performing full-table scans repeatedly because it uses excessive compute power, which translates to electricity and ultimately money.

In the case of the dictionary example, if you’re looking for a word that starts with the letter “A” you’ll likely find an answer quickly doing a linear scan. But if you’re looking for something towards the end, then it’s going to take you a long time, and in complexity analysis typically you consider the worst-case scenario.

One way you could try to speed things up is by flipping pages faster, but that’s not the best way to improve performance. The best way to actually perform this search is by doing less work relative to the amount of data you’re given, and it’s something we tend to do intuitively already.

First, flip to a random page somewhere in the middle and check to see if your word is somewhere on the page.

If your word comes later in alphabetical order, then flip to a page somewhere in the latter half of the dictionary. And if your word comes earlier in the alphabetical order then flip to a page somewhere in the first half.

Then, repeat this process until you’ve narrowed down your search to the page that contains the word you’re looking for.

What you’re doing here is cutting the problem in half repeatedly until you’ve arrived at a solution, and is the essence of binary search algorithms.

There are two caveats to what actually makes this efficiency possible though.

First, the data you are searching through has to be ordered in some way, typically alphabetically, chronologically, or numerically. Without this ordering, you can’t rule out portions of the data set and you’re forced to scan through the whole thing.

Second, you need a way of quickly finding out if a piece of data comes before or after another. It’s a subtle point, but if you’re comparing two numbers or strings of characters, you need to be able to quickly determine which comes first. If those comparisons are time-consuming then they need to be taken into account.

But if you have both those conditions, then you can perform an efficient binary search.

Going back to the original example of the dictionary, what that means is that you can double the number of words in the dictionary, and you’ll only add one additional page flip to find the word you’re looking for. When performing a linear search, doubling the input size means double the amount of work. The work scales proportionally to the amount of data you’re faced with.

So as long as all those words don’t contain significantly more characters, and they were inserted in alphabetical order.

This efficiency in search is why it’s so important to have indexing in your databases. You want to be able to search through an ordered set of data, rather than an unordered one because it allows you to retrieve data at much larger scales, without a significant decrease in performance.

Typically you’ll see this kind of complexity arise in balanced binary search trees too because the data structure itself acts as the ordering. Meaning you can easily translate back and forth between a BST and a sorted list.

One last detail worth mentioning about logarithms is that there is always an implied base, and usually, in the context of programming, it’s base 2. But if you remember your log rules, you’ll know there’s a simple formula for changing that base.

If you’re unfamiliar with logarithms I’d suggest reviewing them and some of their properties, it’s just a good thing to feel comfortable about because it’s so foundational:

They’re related to exponential functions which I’ll talk about more in detail in a later post, but the basic relationship is this:

Putting it All Together

MergeSort, QuickSort, and HeapSort are all considered “Quasilinear Sorts” because of their worst-case time complexities (or in the case of QuickSort, average-case). This term, quasilinear, hints at what to expect from O(N log N), but what does it mean exactly?

If we look back at the table we constructed for N and Log N, we put them together to create a new column for N log N. But instead of plugging in the value of N for each row, just leave it as N.

The result is this:

As we double N, the constant that we multiply by increments by one.

Let that sink in for a minute.

We essentially have linear time complexity, with a relatively small constant in the front, that scales VERY slowly. You have to DOUBLE your input size for you to have an increment in that constant.

Because of this scaling in the preceding constant, we can’t neglect it from our analysis the same way we ignored it for our earlier O(3N) for example. But it’s a slow scaling that allows us to call the overall complexity “quasilinear” because it’s ALMOST as good.

In the context of our earlier MergeSort example, we can notice a pattern that I’ll illustrate in the original diagram I used.

As you merge each level of sub-arrays back up, you’re iterating through every single element at least once. Doing one layer of the merge operations runs in O(N) time.

But because our algorithm keeps splitting the array into halves, we’re doing something similar to the dictionary search problem. In order to have another layer of this diagram, we would need to double the input size. That means that the number of layers is proportional to O(log N).

So if MergeSort is performing a full array scan once per layer, you can simply multiply the number of elements in the array times the number of layers you have to find the full complexity.

Another way of imagining how the amount of work scales is geometrically in the following diagram, where the area of each rectangle represents the amount of additional work required to sort an array of twice the size:

And that’s really all there is to it!

If you haven’t implemented MergeSort yet, I’d encourage you to give it a go:

And if you’re interested in practicing your algorithm skills in a more intensive environment then you should definitely consider enrolling in Outco’s full remote course. Details at Outco.io

Happy coding!

--

--

Sergey Piterman
Outco

Technical Solutions Consultant @Google. Software Engineer @Outco. Content Creator. Youtube @ bit.ly/sergey-youtube. IG: @sergey.piterman. Linkedin: @spiterman