Theme: Image Classification and House Price Estimation with Visual and Textual Features

Team Members: Gökay Atay, Ilkin Sevgi Isler, Mürüvet Gökçen, Zafer Cem Özcan

Frontal View of a House

We all know the place of online sale companies in our life. Even if we are buying a house or a car, googling is the first choice. There are far too many websites that we can use, Zillow, Trulia, Sahibinden are some of there. Everything is smooth for the customer, they can filter the prices and every feature they ask for. But when it comes to the landlord there is a lack of shortcoming about estimating the price of the house. Nobody wants to sell their house below their actual price therewithal nobody wants to wait for months for putting a price that is greater than its real value. Seller has to make an exhaustive research to decide on the exactly correct price. This is where the topic of our project came from.


The goal is to predict the values of the houses correctly. To do this we have to follow these steps.

We’ll use some features like location, number of rooms etc., but the main thing is adding the luxury levels of each house to these data, according to their photos. And to do that first we have to categorize the pictures as a bathroom, bedroom etc., because we think not all users are going to tag their photos depend on their types. With that, we would be able to compare the in-kind rooms with each other.

To summarize,

  1. We need to categorize the photos according to their types.
E.g. Bedroom | Kitchen

For the image classification task, we are planning to use a deep learning model called Convolutional Neural Network (CNN or ConvNet). CNN is a class of deep, feed-forward artificial neural networks, most commonly applied to analyzing visual imagery. CNN helps to reduce the number of parameter required for images over the regular Neural Network (NN). Also, it helps to do the parameter sharing so that it can possess translation invariance.

2. We need to classify these rooms depending on their luxury levels.

Image Taken From Omid Poursaeed, Tomáš Matera, Serge Belongie’s Vision-Based Real Estate Price Estimation Work

3. Using these luxury levels and the other data we have, we need to estimate the price of the house.

For real estate price estimation, we intend to use Random Forest Regression. Random Forest is a flexible algorithm that produces, even without hyper-parameter tuning, a great result most of the time. It can also be used for both classification and regression tasks.

Random forest builds multiple decision trees and merges them together to get a more accurate and stable prediction.

Note: Our approach to the problem, used algorithms, solutions can change throughout the development. Above solutions may not reflect the algorithms which to be used precisely.


Our preferential dataset will be Houses Dataset where there are total of 535 houses and each house is represented by four images for bedroom, bathroom, kitchen and a frontal image of the house. The dataset folder contains 2140 images, 4 images for each house. Also, it contains a text file that contains the textual metadata of the dataset. Each row in the file represents the number of house in order. The numbers represent number of bedrooms, number of bathrooms, area of the house, zipcode and the price.

We think that Houses Dataset will be sufficient for our project. Even so, lack of training data in a machine learning task is a problem that can cause negative effects such as overfitting. Also, noise in general becomes a real issue for small datasets. Considering these problems and the execution results of our dataset, we might extend our dataset using LSUN, Houzz, Zillow datasets. Another way of extension is data augmentation methods. For example, the mirror images of the same house will carry the same characteristics of the house that we would want to use to train the model. Adding these left/right flipped images can increase the robustness of our model. For a final solution, synthetic data construction is possible using simulation house environments in Unity Engine. Since, manual price labeling to our synthetic data will be hard and time-consuming for us, synthetic data construction solution can only be considered when we are training the model for room category classification part.


Some previous studies have already been done to investigate the most important factors that affects the price of houses. All previous studies were directed to the textual qualities of houses. Only few studies were done with both visual and textual qualities. According to Eman H. and Mohamed N., they propose a Multi-layer Neural Network for home price estimation from visual and textual features. Their model also outperforms the Support Vector Regression technique.

Another team uses Deep Convolutional Neural Network to classify images of house parts. Also, problem of the price estimation can be viewed as a regression problem. By using crowd-sourcing framework for comparing images based on their luxury levels, the team generates a price estimation network which makes use of both luxury level info and the metadata value as textual information to simulate Automated Valuation Method.