Ames Iowa Housing Predictions

Hadi Morrow
3 min readJul 8, 2019

--

Is predicting a home’s price feasible? Is can it be done with a computer faster than by hand? Here is my case for whey Ames Iowa is a great place to implement my $300,000 and less predictive model.

With a mean home price of $186,000, a large proportion of homes would rank below the threshold for my model. Only 55 out of 2050 homes were more expensive as you can see by the image below.

Regression Models Compared ElasticNet and Linear

You may ask yourself what the color black and green indicate. This is a crude attempt at showing the difference between the linear model which is shown in back and the ElasticNet which is shown in green. The elastic net model is interesting because it is a combination of both Ridge and Lasso, two models which vary greatly in that Lasso minimizes features while Ridge does not. Furthermore it is clear that one model is not worth getting rid of over the other because all four are basically the same, Lasso, Ridge, ElasticNet and Linear Regression.

The way these models were found was by running a correlation matrix to first find highly correlated features then transform them to essentially make more features that are combinations of others. There is still much to be done on this model but companies like Trulia and its competitors could benefit. Also home realtors could benefit and home owners looking to sell and price their homes could benefit from such models. Finally after transforming, another correlation matrix was analyzed for highly correlated features and removed then as they are indicative of features which display collinearity and covariance issues.

Correlation matrix #2 used to find covariance and correlation

Finally I would like to point out why I am confined in using a simpler model and not some pipe version that standardizes as well. If you note, there is heteroskedasticity in the model and that is due to some of the data at higher price points having multiple features named something different but in essence doing the same thing like the basement and the first floor which for modern homes is the same thing. Moral of the story is the second correlation matrix needs to be optimized in order to optimize the model even further and extend the reach. As you can see in the above plot labeled ‘Regression Models Compared ElasticNet and Linear’ the models under predict. So I compared the residuals of the predicted models for regression models vs regression models with pipe as it is obvious that the pipe (mode complex) model does significantly worse than the simpler model.

Residuals

Concluding that $300,000 as an ideal ceiling is a safe assumption with the mean values at $186,000 we can safely assume the model will accuratley predict home prices to a high level of accuracy. It is safe to assume that a home buyer would find value in this model as a third party alternative. The model is fast, not biased and good at estimating home prices. The model under estimates as a whole so it would be a favored home buyer tool but when you have no tool to go by this is especially useful for home sellers too.

--

--

Hadi Morrow

I revel at the opportunity to bridge the gap between environmental chemistry and data science. A former runner, loves sports and science.