House Prices Prediction using Andrew Ng’s Machine Learning Algorithm

Benjamin Lau
5 min readDec 11, 2018

As a continuation of Andrew Ng’s Machine Learning Course in Python (Linear Regression), I had decided to use my python code in Kaggle competition to test the robustness and practicality of the code.

Since I am testing the linear regression algorithms, the playground competition ‘House Prices: Advanced Regression Techniques’ seems like a good choice for this task. The dataset consists of 80 columns of unique independent variables,1460 rows in the train data and 1459 rows in the test data. The goal of this competition is to predict the sales prices of houses based on these variables. The data can be downloaded here.

Kaggle competition is the best place to test your skills as a data scientist due to its large library of real datasets and various competition to solve real business problems. As the datasets I am using now is from playground competition, there are lots of kernels available to guide beginners to work through this dataset. Be sure to check them up if you are new.

Let’s start with importing all the libraries we need

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.preprocessing import StandardScaler

As for the data, I found that preprocessing the data before splitting into testing and…

--

--

Benjamin Lau

Self-motivated data scientist. My proactive approach has allowed me to embrace and stay at the forefront of the ever-evolving tech landscape.