My Machine Learning Diary: Day 12

Junhong Wang
3 min readOct 31, 2018

--

This is day 12 of my machine learning diary (MLD) series.

Day 12

Today I learned about normal equation. This was pretty math heavy, but very interesting.

Normal Equation

So what’s a normal euqation? For regression problems, we’ve been using gradient descent for finding the best fitting line. But there is another way to find it, and that’s normal equation.

Normal Equation

That’s it! We just need to compute this expression and get the optimal value for θ right away. Note the X in this expression is a matrix. This X is called design matrix and defined as follow:

Design Matrix

The little x in the expression is a sample where the superscript 𝑖 means it’s 𝑖th sample.

Gradient descent or normal equation?

There are both advantages and disadvantages for both algortihms. Gradient descent takes O(kn²) where k is the number of iterations and n is the number of features. Normal equation takes O(n³). Generally normal equation runs faster. However, when n exceeds 10,000, it’s better to switch to gradient descent.

Behind the magic

So where did this magic expression come from? In short, we get the expression by solving (the derivate of the cost function) = 0. Let’s see how to derive it step by step.

These properties of matrix derivative will be helpful for the proof.

Helpful Formula 1
Helpful Formula 2

The cost function is where we are going to start.

Cost Function

This can be rewritten with matrix as follow:

The big matrix can be expressed in terms of the design matrix X like follow:

Using what we’ve got above, the cost function can be rewritten as follow:

Cost Function (Matrix notation)

We can distribute the transpose with the properties of matrix transpose.

Since yᵀXθ is a scalar value, we take the transpose of it and get the same value.

Replace yᵀXθ with what we’ve got above, we get another new expression for the cost function.

Now, using the properties of matrix derivative,we get follow expression:

We get the minimal error when the derivative of cost function is 0.

We get the magic expression after some simplifications!

Phew… it was pretty math intensive, but I made it through.

--

--

Junhong Wang

I'm Junhong. I'm a Software Engineer based in LA. I specialize in full stack web development and writing readable code. junhong.wang