Given an integer array, return the k-th smallest distance among all the pairs. The distance of a pair (A, B) is defined as the absolute difference between A and B.
Let nums be the input array, k be the targeted pair, and n be the size of the input arraynums = [1, 3, 1], k = 2 -> d = 2
Explanation: (1, 1), (1, 3), (1, 3) or (3, 1)nums = [0, 0, 0, 0, 1, 1, 1, 2, 2, 2], k = 5 -> d = 0
Explanation: Of all the 45 pairs, know there are 12 pairs' distance are 0 and k < 12 so d = 0.
- Sort the input array
- Get numbers’ frequencies (will run faster if there are a lot of repeating numbers)
- Store n-1 sorted lists in a binary heap (Refer to the explanation below)
- Count how many pairs’ distance are 0 (repeating numbers).
- Merge n-1 sorted lists
- Use a while loop to pop the smallest distance push the next smallest one from the same array onto binary heap if there is any element left
- Return the k-th smallest pair’s distance
Explanation: If we look at the 2nd example, we know that once the array is sorted the closest distance for a number at i-th index will always be (nums[i + 1] - nums[i]). We also know that there are (n - 1) * n / 2 possible pairs. Each number but the last one in the array can pair with all the numbers after them. Knowing this fact, we can think of this problem as merging n-1 sorted lists. The list represents the distance of all the possible pairs that start from a particular number.1st list: 0, 0, 0, 1, 1, 1, 2, 2, 2 (@ nums)
2nd list: 0, 0, 1, 1, 1, 2, 2, 2 (@ nums)
6th list: 0, 1, 1, 1 (@ nums)
9th list: 0 (@ nums)
Time Complexity: O((n + k)log(n))Sorting the array will take O(n*log(n)) and each iteration in the while loop to pop the smallest distance will take log(n) as we always have at most (n-1) items in the binary heap. In the worst case, when there is no repeating number and k is too large, k will be very close to n^2 as the number of possible pairs = (n - 1) * n / 2 = (n^2 - n) / 2 . Therefore, the runtime would actually become O(n^2log(n)) for the worst case. The n from (n + k) is dominated by the k.Space Complexity: O(n)The memory required for this solution is directly proportional to the input size as we need an array with size of (n-1) to store binary heap.
Now let’s see if we can optimize the runtime and maybe space as well…
The goal is to improve the k from (n + k) in solution 1’s runtime. Sorting will always take O(nlog(n)) so we know that there is nothing we can do to improve n. Let's focus on k here. Obviously, we want to prevent it from becoming n^2 when it gets too large, to solve the question without the need to go over k pairs in other words. If we look at the numbers' correlation after sorting, we realize that we can immediately get the maximum distance (nums[-1] - nums) in constant time. We cannot really get the minimum distance in constant time. However, we know for a fact that the smallest it could be is 0. Thus, we know the distance we're looking for will certainly fall in the range of 0 to (nums[-1] - nums), and so far gathering these informations only takes O(nog(n)).nums = [0, 0, 0, 0, 1, 1, 1, 2, 2, 2], k = 25Let's use this example again. Now instead of using binary heap and popping k smallest elements, which the runtime will depend on k, we can try another approach with the informations we just get. The maximum distance is 2 so we know the all the possible distances are [0, 1, 2]. WHAT IF we can efficiently calculate how many pairs' distance are <= to a specific distance we choose from the range above. This is where the window sliding technique comes in handy, having two pointers, one fast and the other slow, to go through the entire array in linear time. For instance, we start from distance 1. See the illustration below. 1. Start from the 0th index 0 0 0 0 1 1 1 2 2 2
i j 2. Increment j until nums[j] - nums[i] > k and decrement j by 1 0 0 0 0 1 1 1 2 2 2
i j3. # of pairs <= 1 @ i-th index = (j - i + 1)4. Increment i by 15. Repeat 2-5 againThis window sliding technique will run 2*n to get the result but only n will change proportionally based on the input size so this technique will only take O(n). Now we have all the possible distances and a efficient way to calculate # of pairs <= d. It is only searching left to do. Given a sorted array (distance array), we can perform a binary search that will only take O(log(n)) rather than O(n) for linear search. In each search, calculating # of pairs <= the selected distance takes O(n). Therefore, let d be the size of the distance array. The total runtime comes to O(nlog(n)) for sorting plus O(nlog(d)) for searching. As for the memory, it will be optimized to O(1) since no extra memory is needed.
That sounds like a perfect solution for this problem, BUT… can we do better than that???
Optimizing Window Sliding
Previously, we mention that window sliding takes O(n) and we know there is no way we could bring it down to constant time. So the question is can we let it run slightly faster? Counter intuitively, what if we count # of pairs > the distance we choose instead of <=?
Say the distance is 1.1. i starts from the 0th index 0 0 0 0 1 1 1 2 2 2
i,j 2. Only increment j when nums[i] - nums[j] > 1 and increment the total count by (n - i) every time j is incremented 0 0 0 0 1 1 1 2 2 2
j i3. i stops @ 4th index 0 0 0 0 1 1 1 2 2 2
j iBoth window sliding methods will run n iterations but the difference is the 2nd pointer. In method 1, the fast pointer j stops @ 6th index when i = 0 and eventually go up to the last index which also takes n iterations in total. On the other hand, for method 2, the slow pointer j stops @ 4th index and never gets incremented. Both are very optimal approach but generally speaking, method 2 will run slightly faster than method 1.
- Initialize the left and right bounds as 0 and len(nums)-1
- Use a while loop to binary search the distance
- Find # of pairs with distance <= the middle one
Please leave a comment below if you have any suggestion/thought!
Thanks for reading :)