[WEEK 5–ARTIFICIAL REAL ESTATE AGENT]

Ilkin Sevgi Isler
3 min readDec 30, 2018

--

Theme: Image Classification and House Price Estimation with Visual and Textual Features

Team Members: Gökay Atay, Ilkin Sevgi Isler, Mürüvet Gökçen, Zafer Cem Özcan

PRICE ESTIMATION WITH TEXTUAL FEATURES

This week we will introduce you how we classify our textual data. We have number of bedrooms, number of bathrooms, area and zipcode fields in our textual data. And in order to use these data efficiently we take the average values of price of the houses which have same area code and zipcode. So our classification method consider these fields as a feature for predicting the house price. We use Linear Regression and Random Forest Algorithm. Before getting details of results let’s see what are they overall.

Linear regression is a model that supports a linear relationship between the input variables and the output variables. More exclusively, that output can be calculated from a linear combination of the input variables.

Random Forest is a supervised learning algorithm that builds multiple decision trees and merges them together to get a more accurate and stable prediction. Random Forest can be used for both classification and regression problems.

These are the r squared values between each feature and price.

LINEAR REGRESSION

And you can see the graphs that we get with linear regression above.

And these are the coefficients we use to predict the price.

So, our final equation is:

Predicted=(-0.0027*Average Price by Number of Bedrooms)+(-0.0458*Average Price by Number of Bathrooms)+(0.9673*Average Price by Area)+(0.0855*Average Price by Zipcode)

RANDOM FOREST

We used RandomForestRegressor from the scikit-learn library on the training set as benchmark. Random Forest is a tree based machine learning algorithm which is robust to overfitting. This is because it is an aggregation of imperfect decision trees. When the predictions of all the trees are averaged, the imperfections get minimized which called bagging.

Using 10 trees, we obtained 84.13% training accuracy as shown below.

--

--