Mergesort & Quicksort: a simple Python implementation.

Diletta Goglia
Analytics Vidhya
Published in
4 min readJan 11, 2022
Sorting algorithms
Image from Crio.Do

I know what you’re thinking right now: using the sort() Python method is such a quick and easy option. Why should I implement a sorting algorithm from scratch?

If you’re a CS student or if you just want to learn programming, there is no better way to understand code than get your hands dirty. The more you try your own implementation, the more the functioning will stick in your mind. Algorithms will have no more secrets for you.

However, as a Computer Scientist with 7 years of experience in studying algorithms, I know that starting your own code from scratch is not easy at all. For this reason, in this article I want to suggest the easiest implementation (in Python) of some of the most famous sorting algorithms. For each one I first introduce the pseudocode (as a starting framework, just in case you prefer to build your own implementation in some other programming languages) and than I suggest an off-the-shelf Python code that you can immediately copy and test.

⬆⬆⬆ This is the link to my repl where you can find and try all the implementations shown in this article!

I really hope this will help in your understanding and experience in coding.

Let’s get started!

Premise: notation and auxiliary function.

  • V is the array to be sorted
  • n is the number of items in the array
  • l and r are, respectively, left and right pointers
  • swap is an auxiliary procedure that swaps two items i and j in the array V (i.e. simultaneously assigns two values), defined as follows:
Auxiliary procedure “swap”

1. Mergesort

Mergesort GIF from Wikipedia

Pseudocode:

Mergesort pseudocode by D. Goglia
Mergesort pseudocode by author

Mergesort is a recursive algorithm based on the Divide&Conquer paradigm.

Step 1 checks if the array to be sorted consists of at least two items, otherwise it is already ordered and nothing has to be done. If items are at least two, it splits the input array S into two halves, and then recurses on each part. As recursion ends, the two halves S[l, m -1] and S [m, r] are ordered so that Step 5 fuses them in S[l, r] by invoking procedure merge. This merging step needs an auxiliary array (that we will call tempin our implementation) of size n, so that MergeSort is not an in-place sorting algorithm (unlike Heapsort and Quicksort) but needs O(n) extra working space. Given that, at each recursive call we halve the size of the input array to be sorted, the total number of recursive calls is O(log n).

The merge-procedure can be implemented in O(r - l+1) time by using two pointers, i and j, that start at the beginning of the two halves S[l, m - 1] and S[m, r]. Then S[i] is compared with S[j], the smaller is written out in the fused sequence, and its pointer is advanced. Given that, each comparison advances one pointer, the total number of steps is bounded above by the total number of pointer’s advancements, which is upper bounded by the length of S[l, r]. So the time complexity of MergeSort(S, 1, n) can be modeled via the recurrence relation T(n) = 2T(n/2) + O(n) = O(n log n).

Implementation:

2. Quicksort

Quicksort GIF from Wikimedia Commons

Pseudocode:

Quicksort pseudocode by D. Goglia
Quicksort pseudocode by author

The key idea is to partition the input array S[i, j] in two pieces such that one contains items which are smaller (or equal) than the items contained in the latter piece. This partition is order-preserving because no subsequent steps are necessary to recombine the ordered pieces after the two recursive calls. Partitioning is typically obtained by selecting one input item as a pivot, and by distributing all the other input items into two sub-arrays according to whether they are smaller/greater than the pivot. Items equal to the pivot can be stored anywhere. In the pseudocode, for simplicity, the pivot is forced to occur approximately in the middle of the array to be sorted (step 4).

Note that for achieving efficiency in the execution of Quicksort we should pay attention to the ratio between the size of the two formed pieces because the more balanced they are, the more Quicksort comes closer to Mergesort
and thus to the optimal time complexity of O(n log n). In the (worst) case of a totally unbalanced partition, in which one piece is possibly empty (i.e. p = i or p = j), the time complexity of Quicksort is O().

Implementation:

I really hope that this will be helpful for you. If you like this article, please support me with a 👏🏻. Thank you, and enjoy your algorithmic journey!

References:

  • P. Ferragina, The magic of Algorithms!, University of Pisa, 2020
  • F. Romani, Elementi di Algoritmica con esercizi svolti ed esempi in Phyton, 2nd edition, Pisa University Press, 2017
  • Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein, Introduction to Algorithms, Third Edition, The MIT Press, 2009

--

--

Diletta Goglia
Analytics Vidhya

🎓MSc in Artificial Intelligence @UniversityofPisa |💻ML researcher for migration prediction @HumMingBird |🌐Website: https://dilettagoglia.netlify.app/