[WEEK 2] Prediction Of Real Estate Price

Enes Koçak
bbm406f18
Published in
3 min readDec 10, 2018

Team Members: Batuhan Ündar ,Muhammed İkbal Arslan ,Enes Koçak

Photo by rawpixel on Unsplash

First Steps Towards the Project

Everyone needs a data start to work. That’s why we started by collecting our data. We have written a small script to collect this data. This script is collecting real estates features from different websites. We created these features by filtering them for the price estimate. As an example, locations of houses are one of the most important features. But information such as host and announcement date not a factor for estimations. In this way, we have obtained a database that contains only the necessary information.

Understanding the Data

First, we did some analysis of data. To determine the usable data for our model, we investigated the effects of certain criteria on price. Some of the analysis results are as follows:

Average price distribution by cities

As can be clearly seen on the chart, the location of the houses is quite effective on the price. Therefore, the location data will be one of the decisive features of our model. But there will be some problems with using location data. For example, we will not be able to use this criterion when estimating a new location. To overcome this problem, we think about use location data as coordinates and we discussed the effect of coordinate distances on estimates. But at that time we thought that it could reduce the forecast success because there could be sudden price changes between the nearby neighborhoods.

Average price distribution by number of rooms

As seen from this chart, the number of rooms also has serious effects on house prices. For example, a significant increase in house prices was observed when the number of rooms was between 9–12. Our expectation from our model is that it will produce more than normal estimates for a house with a room number between 9–12. So we concluded that the room numbers data would be useful for our model.

Average price distribution by number of bathrooms

As seen in the figure, the increase in the number of baths did not have a significant impact on the average price. It will not make sense for our model to use features that are not very pronounced on price. Therefore, we will need to think again about using these and similar features.

Average price distribution by house size

As can be seen from the figure, size data alone is not a sufficient parameter. Because it would have to be a linear increase to be used alone for estimation. But the effect on house prices may also change due to different factors. Useful for our model, but not suitable for use as an inferential.

Different interesting analyzes

  • The average price of all houses: 603.252 TL.
  • Highest house price: 50 billion TL.
  • Lowest house price: 10.000 TL.
  • Most expensive houses in Istanbul.
  • Most cheaper houses in Çankırı.
  • Most data of houses in Ankara.
  • Least data of houses in Amasya.
  • The maximum number of rooms: 52
  • The minimum number of rooms: 1

Looking at this kind of data, we can also make different inferences. For example, we can expect a more accurate estimate for a house in Ankara. because we have more data in Ankara.

One of the most important things to solve a machine learning problem is to analyze the data correctly. We completed our most important first step for our project by doing data analysis True analysis saves the life :)

--

--