CODEX

Median of two sorted arrays

Madhumitha Raghu
CodeX

--

We are given two arrays which are sorted and our goal is to find the median of these arrays in logarithmic time and O(1) space complexity (LeetCode). This is a very common question that has numerous solutions posted all over the internet. However, it took me some time to wrap my head around it. I would like to describe my thought process and solution here.

Problem statement

To begin with, this is how we compute the median for a single sorted array:

Fig 1. Finding the median of a single even length sorted array
Fig 2. Finding the median of a single odd length sorted array

Let’s first visualize the problem statement. Consider two arrays as shown below:

Fig 3. Finding the median of two sorted arrays (problem statement)

Here, we have two sorted arrays A and B. In order to find the median of these arrays, we can need to combine these two arrays, sort it and compute the median of the combined array. This approach does not satisfy the requirements of the problem statement because:

a) Combining the two arrays into a single array and sorting it(using an algorithm like merge sort) is an O(n*log(n)) time operation

b) The space complexity is also O(n) because we need to create a new array to accommodate the two arrays

The combined array A+B cannot be obtained by simply appending one array to another. This is why the approach of trying to combine the arrays to determine the median won’t satisfy the logarithmic time constraint imposed by the problem— We know the relationship between elements within each array because they are sorted. So, the smallest is at the beginning and the largest is at the end. However, we do not know the relationship between elements across the arrays. Without iterating through both the arrays completely we can’t find the relationship between them and this violates the time constraint.

Solution approach

If you think about it, the constraints are really pushing us in the direction of not having to worry about the combined array. An O(log(n)) algorithm (like binary search) basically keeps discarding a section of the array at every iteration based on some criteria. So, we need to think of a way to incorporate that in our solution.

That is, if we know that A and B are sorted, we do not need to know what A+B will look like, to compute the median. We only need to know the first half of the combined array to find the median. Depending on the length of A+B, the position of the median will be slightly different:

Fig 4. Index of median element in A+B when it has an odd length
Fig 5. Index of median element in A+B when it has an even length

The bottom line is, once we know which elements in A and B contribute to the first half (left side) of the combined array A+B, we don’t really care about the remaining elements and that is what is going to help us solve this in logarithmic time.

Ok, so what about the O(1) space constraint? What this implies is that, we should not be allocating memory for any temporary arrays. We technically don’t need to. We will discuss this more once we get into the implementation of the solution.

In a nutshell, we are going to apply binary search across two arrays to draw a line that indicates how much of a given array will contribute to the left half or first half of the combined array. This can be visualized like this:

Fig 6. Visualizing the binary search approach to determine which elements in each array contribute to the combined array (odd length)
Fig 7. Visualizing the binary search approach to determine which elements in each array contribute to the combined array (even length)

Implementation details

Remember that our goal is to use binary search to figure out how much each array contributes to the combined array. What’s the structure of a binary search problem?

Fig 8. Binary search pseudocode

So, we need to map our problem statement into this format. We have 2 arrays to work with. We should focus on one array (the smaller one) and use it to determine contributions from both arrays. Let’s assume A is the smaller array.

a) What should keep our loop going?

We should keep looping till we have explored all possible ways to combine both the arrays to determine the left half of A+B. As an example, if the size of the left side of A+B is 4 and we have picked one element from A , this automatically implies we need three elements from B. We can either pick nothing from A, a few elements from A or everything from A. So, we keep looping till the contribution of A is equal to its length.

b) What is our target and how should we be testing it?

We want to find the left-side of A+B so that we can easily compute the median. So, we want to find a split that helps us get to the median.

Each split partitions both the arrays into 4 parts and we are combining the elements on the left halves of the splits to form the first half of A+B, like so:

Fig 9. Target visualization

If all elements on the left side are less than all elements on the right side, then, we know that the split has given us the first half of A+B.

That is:

A[aL] < B[bR] and B[bL] < A[aR], where aL, bL, bR and aR correspond to the indices of the elements on either sides of the split.

c) How do we compute median if we find the target split?

Consider a situation where the length of A+B is odd (Fig 9) and also assume that we have found the target split. Now, we have a split where a1< b4 and b3 < a3. Since A+B has a length of 7, the median is the 4th element in A+B and this has to be either a1,b1,b2 or b3 (elements on the left half). We have 4 numbers and there are 4! ways of arranging them (assuming no repetition), as shown below:

Fig 10. Different ways in which elements on the left side of the A+B array (odd length)can be arranged, depending on their values

But, do we really care about these arrangements? We are only interested in the fourth element:

Fig 11. Position of median in A+B (odd length) that we are interested in

The 4th element has to be the largest number on the left-hand side and we only need to compare the largest numbers from A and B to get this! So:

Similarly, let’s look at a situation where A+B has an even length:

Fig 12. Finding median of A+B which has an even length

Now, we have a split where a2 < b3 and b2 < a3. Since A+B has a length of 8, the median is the average of the 4th and 5th elements in A+B.

There are 4! ways of arranging {a1,a2,b1,b2} and for each arrangement, there are 2 ways of picking the 5th element (a3 or b3):

Fig 13. Different ways in which elements on the left side of the A+B array (even length)can be arranged, depending on their values

Here again, do we really care about these arrangements? We are only interested in the 4th and 5th elements:

Fig 14. Position of median in A+B (odd length) that we are interested in

The 4th element is the largest on the left side of A+B and it must be max(a2,b2), just like what we saw for the odd length case. The 5th element has to be either a3 or b3. But, it must be just next to the 4th element. So, we pick the minimum value, i.e. min(a3,b3):

d) How do we keep the loop going if we don’t find the target?

In a nutshell, if we haven’t found the target split, we should either include more or less of A. Let’s walk through a couple of examples to understand this.

Fig 15: First split of two arrays to compute median when length of A+B is odd

In this example, A+B has an odd length of 7 and the median is the 4th element. We first pick one element from A and 3 elements from B to compute the left-half of A+B.

1 < 6. However, 5 > 4. This indicates that there are smaller elements to the right of A.This means neither 5 cannot be the 4th smallest element. We need to include more of A to discover that element. So, we continue to the next iteration:

Fig 16: Second split of two arrays to compute median when length of A+B is odd

Let’s again compare the elements on both the sides — 4 < 5 and 3 < 7 and there are 4 elements on the left side. So, we have found the right split!

Since A+B has an odd length, we need to find the last element in this subarray. This has to be the max of the values on the right most extreme of both the arrays A and B, i.e. max(4,3) and that is 4.

Let’s look at another example where A+B has an even length:

Fig 17: First split of two arrays to compute median when length of A+B is even

A+B has an even length of 8 and the median is the average of the 4th and 5th elements (mean(4,5)). We pick 1 element from A and 3 of them from B.

1 < 6. However, 5 > 4. This means neither 5 cannot be the 4th smallest element. We need to include more of A to discover that element. So, we continue to the next iteration:

Fig 18: Second split of two arrays to compute median when length of A+B is even

Here, 4 < 6 and 5 < 7. So, we have found the median split! So, the median is the average of max(4,3)and min(5,7).

One thing to remember is, we can be in a situation where the split results in one more more of the halves being empty. For example:

Fig 19: Special situation to consider in implementation

This is an interesting case where two of the splits to the right are empty! What do we compare 1 and 2 to? The smaller values are on the left and the larger ones are on the right. Since we have no valid number to compare to, we just assume that 1 and 2 satisfy the condition for a median split. This can be done by comparing them to very large values (+∞). Similarly, if there are empty values on the left, we compare it with -∞.

Time complexity analysis

Now that we know how to implement this algorithm, it is time to make sure it satisfies the constraints of the problem.

We have not used any temporary arrays and we just rely on the indices for our implementation. So, the space complexity is O(1).

We always pick the smaller array to iterate and and every step we either include more of it (less of the other array) or less of it (more of the other array). So, the time complexity is O(log(min(A,B))).

Link to Github repository: https://github.com/madhu90/CodeExamplesForBlogPosts.git

References

[1] https://www.geeksforgeeks.org/median-of-two-sorted-arrays/

[2] https://www.geeksforgeeks.org/median-of-two-sorted-arrays-of-different-sizes/

[3] https://www.youtube.com/watch?v=LPFhl65R7ww&ab_channel=TusharRoy-CodingMadeSimple

[4] https://www.youtube.com/watch?v=MHNTl_NvOj0&ab_channel=IDeserve

--

--

Madhumitha Raghu
CodeX
Writer for

Passionate about computer science and machine learning.