[Week 5–6 - House Price Prediction]
Harun Özbay, Halis Taha Şahin, Cihat Duman
Hello there,
In the past post, we discussed GridSearchCV and validation techniques. We explained the situations in which these techniques are used. We also talked about how we implement it in our own project.
I’m going to talk about two new regression techniques in this week’s post. We will also create sample submission with the test data. After we submit the CSV file we created, we will share our results with you. We will also compare our results on train and test data. However, this post will be the last one.
Bagging Regressor
Let’s first examine the bagging regressor. Bagging (Bootstrap Aggregation) is used when our goal is to reduce the variance of a decision tree. Here idea is to create several subsets of data from the training sample chosen randomly with replacement. Now, each collection of subset data is used to train their decision trees. As a result, we end up with an ensemble of different models. Average of all the predictions from different trees are used which is more robust than a single decision tree.
Linear Regression: 0.134
Random Forest: 0.051
Gradient Boosting: 0.051
Support Vector Regression: 0.099
Bagging Regression: 0.053
We have also optimized the support vector regressor technique using GridSearchCV. You can find the optimized values above. Our previous version was around 0.440.
Decision Tree Regressor
Secondly, we examine the decision tree regressor. Decision tree builds regression or classification models in the form of a tree structure. It breaks down a dataset into smaller and smaller subsets while at the same time an associated decision tree is incrementally developed. The final result is a tree with decision nodes and leaf nodes.
When we apply the decision tree technique, we can see that the result is very high. This means that the tree memorizes the data. But the results on the test data will not be the same.
Results with Test Data
We created CSV file according to submission format. This file includes ids and predicted prices of houses. We also submit for different regression techniques.
Linear Regression: 0.134
Random Forest: 0.145
Gradient Boosting: 0.123
Support Vector Regression: 0.419
Bagging Regression: 0.146
Decision Tree Regression : 0.198
The best result on the test data was obtained by the gradient boosting regression method.