Solving Codepur Cares Fund

Gautam Abhyankar
The Coding Culture
Published in
4 min readDec 11, 2020

--

Looking at the question, removing all the verbiage, the problem translates to finding the sub-matrix with the maximum sum in the given 2D matrix.

Obviously, we could go over all the possible sub-matrices in the matrix and find the one with the maximum sum. But that would require us to use 6 nested loops. 4 for each of the corners of the sub-matrices & 2 for the summing up of all the elements of the sub-matrix, thereby taking its complexity to O(n⁶). For the sake of brevity, I’m gonna skip this code.

Welp, we definitely don’t want that happening. There has to be a better solution. Turns out there is.

Kadane’s algorithm

We know that we can find the subarray with the maximum sum in a 1D “optimally” using Kadane’s algorithm (If you didn’t, have a look here). Maybe, we could use this somehow. How about we iterate over the left & right bounds of the submatrix? Then maybe, we could sum up all the row-wise elements to obtain an array. We could then apply Kadane’s on this array. Wouldn’t this return us the top & bottom bounds of our required sub-matrix? Yes, it would. We also know what left & right bounds this array has. And there, we have our answer. Now, let’s see how this works in code.

Hold on, look at the constraints

Yikes, this is one of the most common mistakes programmers make in competitive coding competitions (Chill out, even I have done it). If we look closely, we see that the bounds of the matrix are between 0 and 10¹⁵. The elements of the matrix are also between -10¹⁵ and 10⁵. Using an int (i.e. a 32-bit int) would be disastrous. You would potentially have an overflow which could even result in having negative bounds of the matrix. Let’s just say that would not go very well. Okay, finally let’s go to the code.

Finally, the code

Now, to accommodate the bounds of the matrix, I’m gonna use size_t which C++ defines as an unsigned long long . And for the elements, we clearly can’t use the above type since it’s given they’re negative. So we’ll use long long , however, I’m a Rust programmer, so I’m gonna typedef it to be a i64 .

You may ask why references instead of pointers for the start & finish variables? Well, you don’t really want to be creating another variable on the stack just to hold the address. Now to iterate over the left & right bounds.

I’ll leave it up to you on how to get the input of the matrix and printing it out. Now, since we can know (or can tell) that Kadane’s algorithm is O(n), and the findMaxSum will turn out to be O(n²), we just brought down the complexity of the solution to O(n³).

We’re now done here but you can read ahead if you want.

Bonus#1: Benchmarking

We wrote all this code, we should really have a look at how fast it is. Turns out there are a bunch of methods to do this. I’ll use the high_resolution_clock provided by the chrono library. Here’s how to use it:

#include <chrono>.
using namespace std::chrono;
.
.
int main() {
.
auto start = high_resolution_clock::now();
findMaxSum(matrix, m, n, finalLeft, finalRight, finalBottom, finalTop);
auto end = high_resolution_clock::now();
auto duration = duration_cast<microseconds>(end - start);
cout << "Time taken:" << duration.count();
}

I’m gonna run this against a randomly generated 87x78 matrix on an Intel 10th Gen i7 CPU. I’m also gonna run it about 1000 times and average out the time.

CLI> ./a.out
87 78
Time taken: 2572

Bonus#2: Tell the compiler to optimize

The compiler by default compiles your code at optimization level 0. You can specify to run at a certain optimization level by specifying any one of the options: -O0 -O1 -O2 -O3 respectively with 0 being the lowest level and 3 being the highest level.
You could also add another option -march=native which would enable architecture native instructions, possibly allowing your code to go faster, but you shouldn’t do this in real-word since this might make the generated binary incompatible on other machines. But, it’s okay to do it here. I’m gonna use the previous inputs here as well.

CLI> g++ solution.cpp -O3
CLI> ./a.out
87 78
Time taken: 1027
CLI> g++ solution.cpp -O3 -march=native
CLI> ./a.out
87 78
Time taken: 640

Bonus#3: Row-major instead of column-major

If you notice in line 13 of the findMaxSum function, we’re incrementing i in every iteration i.e. going down the column. This does not sound good in terms of spatial locality considering we have to go down to dynamic memory on every iteration. If we could somehow increment i along a row, that would be so much better because it is very likely that its adjacent elements are already present in the L1 or L2 cache making the memory accesses faster.

Thinking over this, then maybe instead of iterating over the left & right bounds of the matrix, we could iterate over the top & bottom bounds of the matrix and get the left & right bounds from the kadane function.

One more thing before we do this, we don’t really the loop present on line 8. We could simply replace it with the fill function from the STL.

I’ll be making the necessary changes and posting the entire program below now.

PS: If you’re wondering about the benchmark of this,

CLI> g++ solution.cpp -O3
CLI> ./a.out
87 78
Time taken: 112
CLI> g++ solution.cpp -O3 -march=native
CLI> ./a.out
87 78
Time taken: 43

--

--