Linear Regression with Multiple Variables (Part 2) — Using Normal Equation
This is a python implementation of the Linear Regression exercise in week 2 of Coursera’s online Machine Learning course, taught by Dr. Andrew Ng. We are provided with data for house sizes in square feet and number of bedrooms. The task is to use linear regression to determine how the size and the number of bedrooms affects price of the house. Our ultimate aim is to predict the price of a new house given the size in square feet and the number of bedrooms.
In part 1, we used gradient descent to obtain the coefficient of linear regression, by minimizing the error of prediction. In this task we implement linear regression using normal equation. We obtain values for theta (coefficients),which we then use to predict the price of a 3 bedroom house with an area of 1650 square feet, just like in part1, and we expect to get the same result. When using normal equations,there’s no need to apply normalization to the input values because unlike gradient descent, there are no iterations, the coefficients are obtained in one calculation.
Let’s start by importing relevant python libraries.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
Let’s import the data into a Pandas dataframe called data. There are 3 columns in the data set, Size, Bedrooms, and Price.
data2 = pd.read_csv(‘ex1data2.txt’, names=[‘Size’, ‘Bedrooms’, ‘Price’])
View the first 5 records of the data set.
data2.head()
Assign all but the price field to the independent variable X
X = data2.drop([‘Price’], axis=1)
Assign the price field to the dependent variable y
y = data2[‘Price’]
Define the length of the data set as m
m=len(y)
Append the bias term (field containing all ones) to X
X = np.append(np.ones([m,1]), X, axis=1)
Implementing the normal equation algorithm
There are different ways to implement the normal equation in python. You may choose to implement it one step at a time, or you may do it all at once. I have done the latter.
theta = np.linalg.inv((X.T.dot(X))).dot(X.T.dot(y))print(theta)[89597.9095428 139.21067402 -8738.01911233]
Here we get theta as a 1x3 matrix, we reshape it to a 3x1 matrix which makes it easy to use in our predictions.
theta=theta.reshape(3,1) print(theta[[89597.9095428 ]
[ 139.21067402]
[-8738.01911233]]
Prediction
Our task is to predict the price of a house with an area of 1650 square feet with 3 bedrooms. We represent the information we have as follows:
X = [1650, 3]
Note that we don’t need to normalize X, but we still need to add the bias term (1) . In the section below, we add the bias term to X.
X_new = np.append(1, X)print(X_new)[ 1 1650 3]
We now have a vector ([ 1, 1650, 3]) representing the house, we can pass it into a predictor function to get the price of the house.
Now we need to create the predictor function.
def prediction(X, theta):
'''
This function takes in the features of the house
as well as the coefficients, and returns the
predicted price.
'''
return np.dot(X, theta)
We now call the function and provide the feature vector (X_new) as well as the coefficients (theta).
Pred = prediction(X_new, theta)print(Pred) ## Print out the prediction.[[293081.46433489]]
The value obtained for the prediction, 293081.46433489 is the same as we obtained in part1 after using gradient descent to obtain the best performing learning rate.
We can use the coefficients generated by running the normal equation to predict the price of any house as long as we have the size and the number of rooms.
The normal equation though simple to implement, is not used on large data sets. When the size of the data set becomes too large, the computation becomes too expensive to be practical. In that case we resort to the gradient descent algorithm.
An ipython notebook can be found on github here
You can check out other articles in the series by visiting the links below: