Understanding Heap Sort

Jacky Feng
The Startup
Published in
5 min readApr 5, 2020

A heap sort is a sorting algorithm based on the binary heap data structure. The idea behind a heap sort is to find the highest value and place it at the end, repeating the process until everything is sorted.

In order to fully understand how a heap sort works, we first have to understand a binary heap, and subsequently, a binary tree. As explained in a previous post, a binary tree is a tree data structure where each element has at most 2 children. A binary heap is simply a complete binary tree, in which each level of the tree (except, perhaps, the last level) is completely filled and all nodes are stored starting from the left to right. A binary heap also takes into account how the values are stored. Either the value of the parent node is always greater than the value of its children (called a max heap), or vice versa, where the value of the parent is smaller than that of its children (called a min heap). A heap sort will utilize a max heap to sort its elements.

An example of a max heap, By Ermishin — Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=12251273

A heap sort works as follows:

  1. Build a max heap from the data.
  2. Switch the root (or the top of the tree) with the last node and then remove it from the heap.
  3. Rebuild the max heap again and repeat until there is only one element remaining.

Building a Max Heap

A max heap can be represented using an array. The binary tree and its nodes can be represented by the indexes of an array, starting from the top level and working our way down, from left to right (root has an index of 0, while the left child will have an index 1 and the right child will have an index of 2, so on, and so forth). If the parent node has an index of x, then the left child will have an index of 2x + 1 and the right child will have an index of 2x + 2.

The function to build a max heap will take three arguments: the array, the length/size of the heap and the index of the parent node we are trying to heapify. We will initialize a variable to use to compare to other elements and swap them in as such. We will also define the left and right children of the parent node.

Our first check is to see if the left child exists and if it is greater than the root. If so, we assign our variable with the value of the left child. We do the same for the right child. We also want to make sure that the index is less than the size of our heap that we pass the function. As we sort the heap and pop off elements off the heap, the size will get smaller and we can leave those elements alone, as they will be sorted already.

We then check to see if there were any changes to our variable (if we swapped any elements). If so, we swap those elements in the array, thus putting the largest element as the parent. We call the function again, this time to check the lower elements are also in the correct order for a max heap.

In full, our function will look like this

Building the Heap Sort Algorithm

Moving on to the actual sorting, our first step is to define certain variables. We need the length of the array, the last parent node in the heap and the last child of the heap.

From there, we create a max heap out of our data, iterating through each node, starting at the last parent node and working our way up.

Once we have our max heap, we initiate step 2, which is to swap the root with the last node. After that is done, we run step 1 again (creating a max heap), but this time, we decrease the length of our data, as we no longer need to work with the last element (since it has been sorted). Once this process is complete, we return our sorted array.

Here is our heap sort function in full:

There are a couple of things to note about a heap sort. The time complexity of a heap sort is O(n log n). This sorting algorithm is an in-place algorithm, which means it transforms our data without using a supplemental data structure. In other words, it will directly alter the original array that is passed.

--

--