What a great article. You totally connected the dots regarding what I’ve learned in a machine learning course I took and more recently, a linear algebra class I attended.
I remember my linear algebra professor mentioned that it is more computational efficient to solve the normal equations
X’XΘ = X’y
and then just use back substitution to find the solution instead of calculating the inverse of X’X. Do you know if this would be more efficient than gradient decent?