Binary Search — Find K-th Smallest Pair Distance

Timothy Huang
Apr 16, 2020 · 6 min read
Reference: https://www.unite.ai/what-is-k-nearest-neighbors/

LeetCode: https://leetcode.com/problems/find-k-th-smallest-pair-distance/

Given an integer array, return the k-th smallest distance among all the pairs. The distance of a pair (A, B) is defined as the absolute difference between A and B.

Let nums be the input array, k be the targeted pair, and n be the size of the input arraynums = [1, 3, 1], k = 2 -> d = 2
Explanation: (1, 1), (1, 3), (1, 3) or (3, 1)
nums = [0, 0, 0, 0, 1, 1, 1, 2, 2, 2], k = 5 -> d = 0
Explanation: Of all the 45 pairs, know there are 12 pairs' distance are 0 and k < 12 so d = 0.

Solution 1:

  1. Sort the input array
  2. Get numbers’ frequencies (will run faster if there are a lot of repeating numbers)
  3. Store n-1 sorted lists in a binary heap (Refer to the explanation below)
  4. Count how many pairs’ distance are 0 (repeating numbers).
  5. Merge n-1 sorted lists
  6. Use a while loop to pop the smallest distance push the next smallest one from the same array onto binary heap if there is any element left
  7. Return the k-th smallest pair’s distance
Explanation: If we look at the 2nd example, we know that once the array is sorted the closest distance for a number at i-th index will always be (nums[i + 1] - nums[i]). We also know that there are (n - 1) * n / 2 possible pairs. Each number but the last one in the array can pair with all the numbers after them. Knowing this fact, we can think of this problem as merging n-1 sorted lists. The list represents the distance of all the possible pairs that start from a particular number.1st list: 0, 0, 0, 1, 1, 1, 2, 2, 2 (@ nums[0])
2nd list: 0, 0, 1, 1, 1, 2, 2, 2 (@ nums[1])
.
.
.
6th list: 0, 1, 1, 1 (@ nums[5])
.
.
.
9th list: 0 (@ nums[8])

Complexity Analysis:

Time Complexity: O((n + k)log(n))Sorting the array will take O(n*log(n)) and each iteration in the while loop to pop the smallest distance will take log(n) as we always have at most (n-1) items in the binary heap. In the worst case, when there is no repeating number and k is too large, k will be very close to n^2 as the number of possible pairs = (n - 1) * n / 2 = (n^2 - n) / 2 . Therefore, the runtime would actually become O(n^2log(n)) for the worst case. The n from (n + k) is dominated by the k.Space Complexity: O(n)The memory required for this solution is directly proportional to the input size as we need an array with size of (n-1) to store binary heap.

Now let’s see if we can optimize the runtime and maybe space as well…

Solution 2:

The goal is to improve the k from (n + k) in solution 1’s runtime. Sorting will always take O(nlog(n)) so we know that there is nothing we can do to improve n. Let's focus on k here. Obviously, we want to prevent it from becoming n^2 when it gets too large, to solve the question without the need to go over k pairs in other words. If we look at the numbers' correlation after sorting, we realize that we can immediately get the maximum distance (nums[-1] - nums[0]) in constant time. We cannot really get the minimum distance in constant time. However, we know for a fact that the smallest it could be is 0. Thus, we know the distance we're looking for will certainly fall in the range of 0 to (nums[-1] - nums[0]), and so far gathering these informations only takes O(nog(n)).nums = [0, 0, 0, 0, 1, 1, 1, 2, 2, 2], k = 25Let's use this example again. Now instead of using binary heap and popping k smallest elements, which the runtime will depend on k, we can try another approach with the informations we just get. The maximum distance is 2 so we know the all the possible distances are [0, 1, 2]. WHAT IF we can efficiently calculate how many pairs' distance are <= to a specific distance we choose from the range above. This is where the window sliding technique comes in handy, having two pointers, one fast and the other slow, to go through the entire array in linear time. For instance, we start from distance 1. See the illustration below. 1. Start from the 0th index     0 0 0 0 1 1 1 2 2 2 
i j
2. Increment j until nums[j] - nums[i] > k and decrement j by 1 0 0 0 0 1 1 1 2 2 2
i j
3. # of pairs <= 1 @ i-th index = (j - i + 1)4. Increment i by 15. Repeat 2-5 againThis window sliding technique will run 2*n to get the result but only n will change proportionally based on the input size so this technique will only take O(n). Now we have all the possible distances and a efficient way to calculate # of pairs <= d. It is only searching left to do. Given a sorted array (distance array), we can perform a binary search that will only take O(log(n)) rather than O(n) for linear search. In each search, calculating # of pairs <= the selected distance takes O(n). Therefore, let d be the size of the distance array. The total runtime comes to O(nlog(n)) for sorting plus O(nlog(d)) for searching. As for the memory, it will be optimized to O(1) since no extra memory is needed.

That sounds like a perfect solution for this problem, BUT… can we do better than that???

Optimizing Window Sliding

Previously, we mention that window sliding takes O(n) and we know there is no way we could bring it down to constant time. So the question is can we let it run slightly faster? Counter intuitively, what if we count # of pairs > the distance we choose instead of <=?
Say the distance is 1.
1. i starts from the 0th index 0 0 0 0 1 1 1 2 2 2
i,j
2. Only increment j when nums[i] - nums[j] > 1 and increment the total count by (n - i) every time j is incremented 0 0 0 0 1 1 1 2 2 2
j i
3. i stops @ 4th index 0 0 0 0 1 1 1 2 2 2
j i
Both window sliding methods will run n iterations but the difference is the 2nd pointer. In method 1, the fast pointer j stops @ 6th index when i = 0 and eventually go up to the last index which also takes n iterations in total. On the other hand, for method 2, the slow pointer j stops @ 4th index and never gets incremented. Both are very optimal approach but generally speaking, method 2 will run slightly faster than method 1.
  1. Initialize the left and right bounds as 0 and len(nums)-1
  2. Use a while loop to binary search the distance
  3. Find # of pairs with distance <= the middle one

Please leave a comment below if you have any suggestion/thought!

Thanks for reading :)

The Startup

Get smarter at building your thing. Join The Startup’s +787K followers.

Sign up for Top 10 Stories

By The Startup

Get smarter at building your thing. Subscribe to receive The Startup's top 10 most read stories — delivered straight into your inbox, once a week. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Timothy Huang

Written by

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +787K followers.

Timothy Huang

Written by

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +787K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store