House Sales Predictor Using Deep Learning

Umair Ayub
The Startup
Published in
4 min readSep 24, 2020

Housing prices are an important reflection of the economy, and housing price ranges are of great interest for both buyers and sellers. In this project, house prices will be predicted given explanatory variables that cover many aspects of residential houses. The goal of this project is to create a regression model that are able to accurately estimate the price of the house given the features.

Acquiring data

The data for this project is available on kaggle. The main objective is to predict the price of a given house based upon the previous available data. The dataset is available here: https://www.kaggle.com/harlfoxem/housesalesprediction

Checking out the data

As we can see there are 21 different features upon which the price of the house is dependent. Not all of these features have a high correlation with the price so, we will only be looking at those features having a high correaltion.

Since this is quite a large dataset we have to check for missing values and fortunately there are none!

Exploratory data analysis

Now we have to visualize the relationship between the price and various features present in our dataset.

Majority of houses are in the one million dollars mark.

As is evident from the figure most of the houses have 3- 4 bedrooms.

There are several other features such as the year in which the house was built, number of floors in each house, the square feet of living space available and many more. Each of these features have a very high correlation with the price.

Based off these features alone we can predict the price of a house but the most important feature is the location of the house, which is discussed next.

Geographical Plotting

As is clear from the heatmap the majority of pricey houses are present along the waterfront which is also clear from the actual map of King County, USA. It is quite understandable as better the view from the house, the more expensive it would be.

Feature Extraction & Engineering

There are a lot of columns present in our dataset which are of absolutely no use to us. So it is quite beneficial to us if we simply drop these columns from our dataset. It is also better for us if the date column which is initially an object be converted to datetime.

Creating the model

First we do the train-test split on our dataset and scale the data using MinMaxScaler.

The second step is to actually create a neural network using tensorflow having keras API.

Model Evaluation

We will be evaluating our model using the various metrics available to us through scikit-learn.

So we get an accuracy score of around 0.8 and a mean absolute error of around 1,00,000. This means our predictor will be off by a value of a hundred thousand dollars on a 2–3 million dollar house, which seems to be doing a good job.

--

--

Umair Ayub
The Startup

Author of “Machine Learning - A Comprehensive Approach”. Interested in Data Science, Machine Learning, and Blockchains.