My Machine Learning Diary: Day 12

3 min readOct 31, 2018

This is day 12 of my machine learning diary (MLD) series.

Today I learned about normal equation. This was pretty math heavy, but very interesting.

Normal Equation

So what’s a normal euqation? For regression problems, we’ve been using gradient descent for finding the best fitting line. But there is another way to find it, and that’s normal equation.

That’s it! We just need to compute this expression and get the optimal value for θ right away. Note the X in this expression is a matrix. This X is called design matrix and defined as follow:

The little x in the expression is a sample where the superscript 𝑖 means it’s 𝑖th sample.

Gradient descent or normal equation?

There are both advantages and disadvantages for both algortihms. Gradient descent takes O(kn²) where k is the number of iterations and n is the number of features. Normal equation takes O(n³). Generally normal equation runs faster. However, when n exceeds 10,000, it’s better to switch to gradient descent.

Behind the magic

So where did this magic expression come from? In short, we get the expression by solving (the derivate of the cost function) = 0. Let’s see how to derive it step by step.

These properties of matrix derivative will be helpful for the proof.