Housing prices are an important reflection of the economy, and housing price ranges are of great interest for both buyers and sellers. In this project, house prices will be predicted given explanatory variables that cover many aspects of residential houses. The goal of this project is to create a regression model that are able to accurately estimate the price of the house given the features.
The data for this project is available on kaggle. The main objective is to predict the price of a given house based upon the previous available data. The dataset is available here: https://www.kaggle.com/harlfoxem/housesalesprediction
Checking out the data
As we can see there are 21 different features upon which the price of the house is dependent. Not all of these features have a high correlation with the price so, we will only be looking at those features having a high correaltion.
Since this is quite a large dataset we have to check for missing values and fortunately there are none!
Exploratory data analysis
Now we have to visualize the relationship between the price and various features present in our dataset.
Majority of houses are in the one million dollars mark.
As is evident from the figure most of the houses have 3- 4 bedrooms.
There are several other features such as the year in which the house was built, number of floors in each house, the square feet of living space available and many more. Each of these features have a very high correlation with the price.
Based off these features alone we can predict the price of a house but the most important feature is the location of the house, which is discussed next.
As is clear from the heatmap the majority of pricey houses are present along the waterfront which is also clear from the actual map of King County, USA. It is quite understandable as better the view from the house, the more expensive it would be.
Feature Extraction & Engineering
There are a lot of columns present in our dataset which are of absolutely no use to us. So it is quite beneficial to us if we simply drop these columns from our dataset. It is also better for us if the date column which is initially an object be converted to datetime.
Creating the model
First we do the train-test split on our dataset and scale the data using MinMaxScaler.
The second step is to actually create a neural network using tensorflow having keras API.
We will be evaluating our model using the various metrics available to us through scikit-learn.
So we get an accuracy score of around 0.8 and a mean absolute error of around 1,00,000. This means our predictor will be off by a value of a hundred thousand dollars on a 2–3 million dollar house, which seems to be doing a good job.
Here is the full notebook on kaggle:
House sale prediction with Keras & ANN
Explore and run machine learning code with Kaggle Notebooks | Using data from House Sales in King County, USA
Here is the github repository for the same:
Contribute to Umair-1119/House-Predictor development by creating an account on GitHub.
Feel free to reach out to me on LinkedIn!