[Week 5–6 - House Price Prediction]

Halis Taha Şahin
bbm406f18
Published in
2 min readJan 14, 2019

Harun Özbay, Halis Taha Şahin, Cihat Duman

Photo by Jesse Roberts on Unsplash

Hello there,

In the past post, we discussed GridSearchCV and validation techniques. We explained the situations in which these techniques are used. We also talked about how we implement it in our own project.

I’m going to talk about two new regression techniques in this week’s post. We will also create sample submission with the test data. After we submit the CSV file we created, we will share our results with you. We will also compare our results on train and test data. However, this post will be the last one.

Bagging Regressor

Let’s first examine the bagging regressor. Bagging (Bootstrap Aggregation) is used when our goal is to reduce the variance of a decision tree. Here idea is to create several subsets of data from the training sample chosen randomly with replacement. Now, each collection of subset data is used to train their decision trees. As a result, we end up with an ensemble of different models. Average of all the predictions from different trees are used which is more robust than a single decision tree.

Linear Regression: 0.134

Random Forest: 0.051

Gradient Boosting: 0.051

Support Vector Regression: 0.099

Bagging Regression: 0.053

We have also optimized the support vector regressor technique using GridSearchCV. You can find the optimized values above. Our previous version was around 0.440.

Decision Tree Regressor

Secondly, we examine the decision tree regressor. Decision tree builds regression or classification models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes.

Example Decision Tree

When we apply the decision tree technique, we can see that the result is very high. This means that the tree memorizes the data. But the results on the test data will not be the same.

Results with Test Data

We created CSV file according to submission format. This file includes ids and predicted prices of houses. We also submit for different regression techniques.

Linear Regression: 0.134

Random Forest: 0.145

Gradient Boosting: 0.123

Support Vector Regression: 0.419

Bagging Regression: 0.146

Decision Tree Regression : 0.198

The best result on the test data was obtained by the gradient boosting regression method.

--

--