Sort() in STL

Published in

The Startup

7 min readJun 9, 2020

Whether or not you’re new to sorting algorithms or familiar with some of them already, you’ve probably heard or read about sort() in some context or another. It’s reputation tends to precedes itself!

Introduction to STL and Sort()

The Standard Template Library (STL) is a set of C++ template classes to provide common programming data structures and functions such as lists, stacks, vectors, etc. It is a library of container classes, algorithms, and iterators.

Sorting is one of the most basic functions applied to data. It means arranging the data in a particular fashion, which can be increasing or decreasing. There is a built-in function in C++ STL by the name of sort().

Syntax

sort(start address, end address, comparator)

where
-start address : the address of the first element of the array
-end address : the address of the last element of the array
-comparator : the comparison to be done with the array.(This parameter is optional)

Internal structure of Sort()

This function internally uses Intro Sort. In more details it is implemented using hybrid of Quick Sort, Heap Sort and Insertion Sort.By default, it uses Quick Sort but if Quick Sort is doing unfair partitioning and taking more than N*log N time, it switches to Heap Sort and when the array size becomes really small, it switches to Insertion Sort.

Time complexity:
-Best Case — O(N log N)
-Average Case- O(N log N)
-Worse Case- O(N log N)

Inching towards Insertion Sort

On a very basic level, an insertion sort algorithm contains the logic of shifting around and inserting elements in order to sort an unordered list of any size. The way that it goes about inserting elements, however, is what makes insertion sort so very interesting!

The insertion sort algorithm maintains two subsets (often referred to as subsections or sub lists) — a sorted subset, and an unsorted subset.

step by step implementation of Insertion sort

Stepping through Insertion sort:
1) Assume the first item is sorted.
2) Find the next value to compare to the sorted value.
3) Shift over any necessary elements to make space for the value to be added.
4) Insert value into sorted subset and repeat steps 1–3.

Effectively, once our first item is pulled out into the “sorted” subset, we continue the same process for every single unsorted item in the “unsorted” subset, until we have sorted the entire collection.

In other words, each iteration of insertion sort causes the sorted subset to grow, and the unsorted subset to shrink.

Some important characteristics of Insertion Sort() :
-It has one of the simplest implementation.
-It is efficient for smaller data sets, but very inefficient for large data sets.
-It is in-place sorting for iterative approach, but for recursive, we have to use a stack, so it causes O(N) space complexity.
-Its worst case time complexity is : O(N²)

Quadratic running time is generally a bad idea, because it means as our input doubles, the time it will take for our insertion algorithm to sort our input will quadruple. This does not bode well for us if we have a large datasets that we’re trying to sort.
So, we prefer to use this algorithm when, either we have mostly sorted dataset or the size of dataset is quite small.

Diving into Heap Sort

Someone once told me that everything important in computer science boils down to trees. Literally just trees. We can use them to build things, parse things, and interpret things (yes, there might be some foreshadowing happening here, don’t worry about it if it doesn’t make any sense to you just yet, because soon, it will!). And we can even use them to
— you guessed it! — sort things.

Before we dive into heap sort, let’s make sure that we have heaps straight in our heads. We might remember that a heap is really nothing more than a binary tree with some additional rules that it has to follow: first, it must always have a heap structure, where all the levels of the binary tree are filled up, from left to right, and second, it must either be ordered as a max heap or a min heap.

Heap sort is a comparison based sorting technique based on Binary Heap data structure. It is similar to selection sort where we first find the maximum element and place the maximum element at the end. We repeat the same process for remaining element.

Why array based representation for Binary Heap?
Since a Binary Heap is a Complete Binary Tree, it can be easily represented as array and array based representation is space efficient. If the parent node is stored at index I, the left child can be calculated by 2 * I + 1 and right child by 2 * I + 2 (assuming the indexing starts at 0).

Heap Sort Algorithm for sorting in increasing order:
1. Build a max heap from the input data.
2. At this point, the largest item is stored at the root of the heap. Replace it with the last item of the heap followed by reducing the size of heap by 1. Finally, heapify the root of tree.
3. Repeat above steps while size of heap is greater than 1.

We have an unsorted array Arr having 6 elements and then max-heap we have built the max-heap.

After building max-heap, the elements in the array Arr will be:

Step 1: 8 is swapped with 5.

Step 2: 8 is disconnected from heap as 8 is in correct position now

Step 3: Max-heap is created and 7 is swapped with 3.

Step 4: 7 is disconnected from heap.

Step 5: Max heap is created and 5 is swapped with 1.

Step 6: 5 is disconnected from heap.

Step 7: Max heap is created and 4 is swapped with 3.

Step 8: 4 is disconnected from heap.

Step 9: Max heap is created and 3 is swapped with 1.

Step 10: 3 is disconnected.

After all the steps, we will get a sorted array.

These two observations are actually the key to the question of how and why heap sort is as fast as it is. Calling buildMaxHeap takes O(n) time, since every single item must be added to the heap, and a larger amount of elements mean a larger heap. However, remember that we are dealing with a binary tree, and binary trees are logarithmic in nature. So, even though we have to call heapify again and again, invoking this function is actually fairly fast, since it will run in logarithmic time, or O(log n).

The combination of these two time complexities :
Heap sort runs in linearithmic time, or in Big O notation, O(n log n). So, even though heap sort seems so much like selection sort, it’s a lot faster! Selection sort runs in quadratic time, or O(n²), which is so much less efficient than linearithmic time.

Pivoting to Quick Sort

The quick sort algorithm is a sorting algorithm that sorts a collection by choosing a pivot point, and partitioning the collection around the pivot, so that elements smaller than the pivot are before it, and elements larger than the pivot are after it. It continues to choose a pivot point and break down the collection into single-element lists, before combining them back together to form one sorted list.
This is also known as a divide and conquer algorithm, which breaks down a problem into simpler versions of itself.

step by step implementation of quick sort

Choose a pivot value. We take the value of the last element as pivot value, but it can be any value, which is in range of sorted values, even if it doesn’t present in the array.

•Partition. Rearrange elements in such a way, that all elements which are lesser than the pivot go to the left part of the array and all elements greater than the pivot, go to the right part of the array.

•Sort both parts. Apply quick sort algorithm recursively to the left and the right parts.

Quick sort can sort items without creating a new or duplicated list, it doesn’t require external memory; instead, it can store its entire dataset without using external memory, making it an internal sorting algorithm.

Time Complexity:
The average runtime for an unsorted list, and a partition close to the median is O(NlogN).
The average runtime for a sorted (nearly sorted) list, or a partition that is far from the median is O(N²).