Normal Equation
6/20 Machine Learning via Stanford notes
normal equation is another algorithm to choose the parameters. It requires less steps/iterations.

X includes X0 (which is a column of 1)
this reminds me of something in stat102b… what is it??
Pros and Cons
Gradient descent
- requires many iterations
- works well with high dimensional model
- need to choose alpha
Normal Equiation
- no need to choose alpha
- 1 iteration
- slow when data has high dimensions (n>10,000)
What if X^t*X is non-invertible?
- there are redundant features
- too many features (regularization or delete features)