[Week 3-House Price Prediction]

Harun Özbay
bbm406f18
Published in
2 min readDec 21, 2018
Photo by Scott Webb on Unsplash

Team Members: Harun Özbay, Halis Taha Şahin, Cihat Duman

In this week, we have reviewed some machine learning methods and implemented them on the training data.

As mentioned in the previous post, we have reviewed the training data and preprocessed it to make it ready for applying the several regression methods such as linear regression,random forests and boosting.

Firstly, we considered all 80 features even filled with None values for all rows.Here are the root-mean-squares errors between the logarithm of the predicted value and the logarithm of the real value:

Linear Regression: 0.171

Random Forest: 0.134

Gradient Boosting: 0.134

Then we have omitted the features shown below which have negative correlations with the sale prices to observe the effect:

MiscVal         -0.020021
OverallCond -0.036868
YrSold -0.037263
LowQualFinSF -0.037963
MSSubClass -0.073959
KitchenAbvGr -0.147548
EnclosedPorch -0.149050

And here are the results :

Linear Regression: 0.174

Random Forest: 0.135

Gradient Boosting: 0.135

taken from https://kadirnar.com/post/veri-on-isleme-data-preprocessing-/50

After that, we came up with the idea of filling some of the numerical features(LotFrontage) -which we filled with zeros before- with the median of none-null values of the other samples with respect to some other non-numerical features(Neighbourhood). The RMS results are below:

LotFrontage-Neighbourhood:

Linear Regression: 0.171

Random Forest: 0.136

Gradient Boosting: 0.136

The results after one more relation between a numerical and a non-numerical feature is considered:

Lotfrontage-Neighbourhood + GarageYearBuilt-Neighbourhood:

Linear Regression: 0.172

Random Forest: 0.134

Gradient Boosting: 0.134

We are planning on making better predictions by finding more robust relations between features. We will also add more regression methods such as support vector machines and KNN.

--

--