Business questions from the Houses Dataset

Angel Moreno Torres
Modeling House Prices
4 min readMar 29, 2020
Photo by Luca Bravo on Unsplash

Using the model for predicting the house’s sale price in Ames, Iowa, that I developed in my first post, I’m going to try to answer 5 interesting business questions:

  1. What are the 5 most important factors that affect the Price in that area?
  2. What would be the best oportunities to buy a house under the market price in Ames?
  3. What would be the worst properties within that database to buy?
  4. In which months of the year are the best offers cheaper? and more expensive?
  5. What year in the dataset had more good deal houses?

The features more related with the Sale Price?

The features more related with the Sale Price in decreasing order are:

  • The total surface of the house, the more surface, the higher price.
  • The overall material and finish quality, the better, the higher price.
  • The parking capacity, the more capacity, the higher price.
  • The original construction date, the newer, the higher price
  • The number of bathrooms, the more bathrooms, the higher price.

Best and worst deals in the dataset

The house prices far below our model prediction may be a really good oportunity, taking in account our model is really accurate under 400.000 $.

In the other side, the house prices far above our model prediction may not be the best deal, as we would be buying a house really overrated.

Base in those ideas I have classiffied the houses following a Traffic Light Rating system:

  • The 10% more extreme cases where the real prices are below the prediction are clasiffied as green, a priori, these are the best deals.
  • The 10% more extreme cases where the real prices are above our prediction are clasiffied as red, a priori, these are the worst deals.
  • The 80% in the middle are clasiffied as ambar, it would depend on the case, but we do not have the certainty of the cases above.

In the table below we have the means of 5 important features to see the differences between Score groups.

As we can see:

  • Best deals profiles are large houses with good qualities, 2 car parking size and 1.67 bathrooms on average, with an average price 7% under the mean of the SalePrice in the whole dataset 180.921 $ and a 25% on average under the model predictions.
  • Worst deals profiles are small houses with low qualities, 1.58 car parking and 1.53 bathrooms on average, an average price 18% above the mean of the SalePrice in the whole dataset 180.921 $ and a 24% on average above the model predictions.

In which months of the year are the best offers cheaper? and more expensive?

As we can see in the graphs below, the prices of the good deals are on average above 200.000 $ in January, being the highest average price and far below 150.000 $ in December, being the lowest average. September would be the second lowest avg.price.

What year in the dataset had more good deal houses?

As we see in the graphs, 2007 and 2009 had more number of Green Score Houses, therefore, good deals. The difference between boths years relies on the huge average price difference of these good deals, much lower in 2009.

Jupyter notebook can be found on Github . Enjoy the COVID-19 quarantine!

This is my Linkedin profile, add me!

--

--

Angel Moreno Torres
Modeling House Prices
0 Followers

Actuary and Data Scientist living in Madrid, Spain