Elephant in the room — why isn’t this preferred over Gradient Descent‽ What an elegant solution this is, you say. Well, here’s the main reason why: computing the seemingly harmless inverse of that (m * n) by (n * m) matrix is, with today’s most efficient Computer Science algorithm, of cubic time complexity! This means that as the dimensions of X increase, the amount of operations required to compute the final result increases in a cubic trend. If X was rather small and especially had a low value for n/wasn’t of high dimensions, then using the Normal Equation would be feasible. But for any industrial application with large datasets, the Normal Equation would take extremely — sometimes nonsensically — long.
Rohan #3: Deriving the Normal Equation using matrix calculus
Rohan Kapur
1087

What a great article. You totally connected the dots regarding what I’ve learned in a machine learning course I took and more recently, a linear algebra class I attended.

I remember my linear algebra professor mentioned that it is more computational efficient to solve the normal equations

X’XΘ = X’y

and then just use back substitution to find the solution instead of calculating the inverse of X’X. Do you know if this would be more efficient than gradient decent?

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.