A Closer Look at Heapsort

Parul Baweja
5 min readApr 30, 2018

--

I’ve been reviewing sorting and heaps — so why not do both?!

What is a heap?

A heap is an array, visualized as a binary tree. Heaps tend to have the following methods: insert, pop/delete, and lookup/peek. Insert and pop have a runtime of O(log(n)), while peek is O(1). All levels of the binary tree must be filled from left to right. A heap is really great for quick access to the largest or smallest elements: we can either implement a min or a max heap. For this article, we’ll create a max heap, in which every root is greater than its children. Take a look at the following example:

Binary Tree Visualization of a Heap

For each parent, we can find its children at the following indices:

  • left child — 2i + 1
  • right child — 2 i + 2

So, for lst[0] = 10, the children are:

  • left child — 2(0) + 1 = 1 → lst[1] = 6
  • right child— 2(0) + 2 = 2 → lst[2] = 8

Why use heaps for sorting?

Heaps always follow a particular ordering (the max is the root in this case). So, by taking advantage of this property, we can ‘pop’ off the max of the heap repeatedly and establish a sorted array.

The Heapsort Play-by-Play

  1. Create a max heap from the unsorted list.
  2. Swap the max element, located at the root, with the last element.
  3. Reheapify or rebalance the array again to establish the max heap order. The new root is likely not in its correct place.
  4. Repeat Steps 2–3 until complete (when the list size is 1).

Here’s the code:

Step 1: Heapify a list
We start looking at nodes from the second to last level or depth of nodes in the tree. The start index is n / 2 — the middle of the array and also the second to last level of the tree. What’s the intuition behind starting here?

Naively, we can push each element onto a heap, which would take O(n*log(n)). Starting at the n/2 level and looking down at the children will instead take O(n) (see more in Analysis section).

Then, we compare each node to its children: if it’s less than the max child, we swap. Otherwise, we decrement the current index and repeat. Take a look at the gif below:

In this example, we must recurse to rebalance the root, 3, until it finds its correct place. This requires two swaps for the element 3. The heapify function below begins at the middle of the list and calls the rebalancing function (balance) until we reach the beginning of the list.

Step 2: Swap the max element with the last element

Let’s use a helper function to get this done.

Steps 3: Rebalance/reheapify
The heap is shrinking so take care to determine whether the children are out of bounds of the heap when rebalancing. (I probably could have abstracted away this code to the find_max_child function).

Both heapify and heapsort use balance() to rebalance the list. The rebalancing function swaps the current node down the tree until a) the node is greater than the children or b) the node is a leaf.

Analysis

Let’s look at the implementation in parts.

Heapify

This implementation balances downward: starting at the second to last level of the tree, we swap the elements down if they are less than one of their children.

I thought of this in terms of comparisons, as in ‘How many comparisons does each node make to determine swapping?’

The leaves (n/2 in total) don’t make any comparisons. The second to last level (n/4 nodes in total) makes 2 comparisons each. The third to last level (n/8 nodes in total) make 4 comparisons each and so on. Generally, there are 2 comparisons made per level each and there are n /2^(i+1) nodes at each level. Illustrating this pattern, the ith level makes the following number of comparisons:

2 * i * n / 2^(i + 1)
= i * n / 2^i

So, we need to find the sum of the # comparisons made — the sum of i from 1 to h because there are h levels and 2 ^ h = n due to the binary tree structure. Below is a proof of the work done:

Σ (i * n / 2^i)    # summation from i=1 to h
Σ (i * 2^h / 2^i) # substitute n with 2^h per assumptions
2^h Σ (i / 2^i) # factor out 2^h, since it's a constant
# factor out 1/2, another constant(2^h / 2) Σ (i * (1/2)^(i-1))# let's substitute 1/2 with x temporarily(2^h / 2) ∫ Σ (i * (x)^(i-1))# take the integral(2^h / 2) d/dx Σ x^i# the remaining summation (Σ x^i) is a geometric series and evaluates to:Σ x^i = 1 / 1 - x# substitute the summation and take the derivative(2^h / 2) d/dx (1 / 1 - x)
(2^h / 2) * 1 / (1 - x)^2
# plug in 1/2 for x(2^h / 2) * 1 / (1 - (1/2))^2
(2^h / 2) * 4
2^h * 2
n * 2
= O(n)

Swap

This is constant.

Reheapify

To reheapify, we essentially pop off a node each iteration. Popping or deleting takes O(log(n)). Each time, the size of the heap decreases by 1, so popping takes O(log(n-1)), O(log(n-2)) and so on.

log(n-1) + log(n-2) + … + log(1) = log(n!) = O(n * log(n))

Conclusion

As such, the time complexity of heapsort is:

O(n) + O(n*log(n)) = O(n*log(n))

--

--