Understanding sorting: A super-simplified explanation

Python and javascript implementations for algorithms

XQ
XQ
Nov 13, 2021 · 8 min read

Sorting is… well, sort of everywhere around us. In the real world, we have the sort option in a lot of applications, say sorting files based on their timestamp. The basic gist is simple- we want to arrange the items in a list in a specific order. It is generally the ascending or descending order of values. Sorting is used as a step in many different types of problems and algorithms. Most programming languages have inbuilt sort functions which you can use directly. You can kind of treat it like a black box. Whenever you identify that you want to sort something, you can simply call the sort function and proceed with the next step. For example, let’s say you have a question like finding the maximum value in the array of numbers. In python, you can simply do array.sort()to get your sorted array, and then you can return the last value i.e. array[n] as the largest value. In javascript, the default sort function sorts values as strings. For sorting numbers, you can use:

array.sort(function(a, b) {return a - b;});

In some interviews, you may be asked to explain it or solve for something without using the inbuilt functions. So, it is necessary to have a basic understanding of the same.

I personally would like to treat sorting as a black box as I know what the exact output would be. Why do we want to reinvent the wheel again and again? Anyways, let’s see how to logically dissect sorting.

Problem Statement

You have an array of n integers. You have to sort them in ascending order and you cannot use any inbuilt libraries.

I will explore a basic approach and help you understand how we are constructing logic.

The solution

  • First, let us think of something straightforward. If we have an array of numbers, we need to reassemble it so that the numbers are arranged in ascending order. Whenever there is an array to deal with, the first obvious thing that can be done is to write a for loop to visit each element of the array.
  • Let me take a sample array and try to construct logic by doing a dry run. Assume you have an array = [5, 7, 3, 2].
  • You can easily tell that the sorted array is [2, 3, 5, 7]. As humans, we simply look at numbers and rearrange them. How do we make a computer understand that?
  • One direct observation you can notice is that in a sorted array, all the elements to the left of any given element are lesser than that element and those to the right are greater than that element.
  • We just need to ensure the same for all elements. Numbers to the right of any given element must be greater than that number.
  • Let me do a dry run on [5, 7, 3, 2]. Let us visit the first element 5. 7 > 5 and is to the right of 5. Logically correct. Let us now go to the element 7. The next element 3 is less than 7. Hence it should be pushed to the left of 7. This can be done by swapping the positions. Our new array would now be [5, 3, 7, 2].
  • Similarly 2 < 7. So we swap again to get [5, 3, 2, 7] after running this logic across the array once. Notice that 7 is now in the correct position.
  • Loop once more with the same logic and you will get [3, 2, 5, 7]. Do it once again, and you’ll get the required answer = [2, 3, 5, 7].
  • What we did just now is like the bubble sort. It’s a brute force approach to sorting. As you can see, we are looping through the array multiple times.

There are some optimizations you can do to this. Several sorting techniques have come up like quicksort, mergesort, etc. which are more efficient. You can check them out online if interested to know the inner workings.

Here, let us focus on practically using sorting as a step in solving different types of coding problems where you can use the inbuilt sort functions.

Let’s take a Leetcode problem statement:

Given two sorted arrays nums1 and nums2 of size m and n respectively, return the median of the two sorted arrays.

  • Median is basically the middle element of a sorted array if the array length is odd and the average of the middle numbers in case of even length.
  • The direct way we can find this is by creating a combined array of nums1 and nums2 of length m + n. Then we sort this array and find the median using the standard formula.

Here’s the python implementation of the same. Do read the code comments to understand what each line is doing:

Once you finish reading this blog and understand the basic gist, try to solve it yourself on Leetcode.

Let’s take another problem statement:

Given two strings s and t, return true if t is an anagram of s, and false otherwise.

  • There are many different ways to solve this. Sorting directly could be one brute force approach.
  • Python has another inbuilt function called sorted() which can be used to sort strings in order of the alphabet.
  • If given strings are anagrams, then the sorted strings will be equal. That’s the logic we can use.

Here’s the single line code solution in python:

def isAnagram(self, s: str, t: str) -> bool:
return sorted(s) == sorted(t)

You can try different approaches on Leetcode and check for yourself.

If you are looking for more practice in simple and straightforward questions, you can try another problem:

Try to apply all the inbuilt functions you have read here like len(), sorted(), range()etc.

Let’s take up one more problem. This time let us do it in javascript.

Given an array nums containing n distinct numbers in the range [0, n], return the only number in the range that is missing from the array.

  • There are several ways to do this. Here, let us focus on how to use sorting to solve and then optimize it further.
  • So, we have an array of n integers from [0, n] with one number missing.
  • If we sort this array, the ith number will be in the ith index, except for when we reach the missing number.
  • Let’s take an example array [4, 0, 1, 2]. We have n = 4.
  • Sort this array and you get [0, 1, 2, 4].
  • Here, we can see that array[0] = 0, array[1] = 1, array[2] = 2.
  • But array[3] != 3, and its value is 4. Hence, 3 is the missing number.
  • If we loop through the entire array and find that all numbers are equal to their index, then the missing number will be the final number, the length of the array itself.

Let’s translate this logic to code in Javascript. Do read the code comments for each line. See if you can convert this code into python when you attempt on Leetcode.

Now, let’s think about whether there is a more efficient way to solve this. A better approach is just right in front of us. We need to tweak our thinking a bit.

Given n numbers, if one of them is missing, we can actually use the sum of numbers formula, right? The sum will be less by that missing number.

Here’s a one-liner solution of the same:

var missingNumber = function(nums) {
const n = nums.length
return ( n*(n+1)/2 - nums.reduce(function(a, b) { return a + b;}, 0));
};

Here, we are basically returning the difference between the sum of n numbers and the sum of the given array. This is a simple way how the same problem can be solved more efficiently by finding a value (sum in this case) that can help us determine the answer we want.

Notice how we used the reduce() function to find the sum of an array in javascript. I will cover more about it in future articles. #StayTuned.

Try solving the same problem with different approaches in Python on Leetcode.

And we just solved 4 Leetcode questions in this one blog. Kudos to you! We will explore more complex problems in future blogs.

How do you sort a billion files efficiently?

Beyond the online tests and DSA rounds, this is a common interview question you may come across. When given a huge amount of data, we can’t do things normally. There are some approaches that can be used to optimize stuff.

You can classify sorting into two types in terms of memory used:

  • Internal Sorting

Every computer has some memory allocated where all these calculations happen and the algorithms run to give you the output. If your sorting is done within this main memory, it is called internal sorting. This is what happens in most sorting algorithms when the data to sort is small.

  • External Sorting

When the data is very huge, say a billion files, your computer's main memory is not enough. How would you sort it then? The answer is external sorting where some external memory is used beyond the RAM.

How exactly is this external memory used?

Among the many ways to do it, there are two standard ways called external merge sort and external distribution sort.

In external merge sort, the basic idea is to break down the sorting into smaller chunks where each chunk can be sorted within the RAM memory. Then, these sorted chunks are merged together. To explore more on how exactly it happens, you can refer to these links:

I’ll explore in more detail and distill the math and complexity that’s explored here for easy understanding with practical examples in a different blog.

The Research Nest

Empowering humanity with exclusive insights.