A Stroll Through Small-Town America: Real Estate Data from Ames, Iowa

Sean Holland
8 min readJul 10, 2019

--

If you have read up on your trivia of small-town American cities, you might know Ames, Iowa as the home to Iowa State University and a city boasting over 65,000 residents — at least when school is in session when ISU is home to about 30,000 students. If you happen to be a real expert you might even know that Ames is the seventh largest city in the 31st largest state by population.

Or, if you are like me and the 99 percent of Americans who do not live in Iowa, you have not once heard of or given thought to the bustling metropolis of Ames. Thankfully, dear Reader, I have done enough research for the both of us.

This past Spring, I took Introduction to Econometrics and our final empirical project involved analyzing real estate data in Fairfax County, Virginia. So when I saw a Kaggle competition for predicting real estate values in the hip, hot market of Ames, Iowa I could not possibly resist. So here it is, my analysis of real estate values in everybody’s seventh favorite Iowa city.

You’re welcome.

The very first result when I typed “Ames, Iowa” into a royalty-free image site. Warning: may not actually be Ames, Iowa

Data

Okay, that’s probably enough bagging on small-town America. These data were pretty fun to work with. The dataset is relatively small, with only 1,450 residential properties to analyze, all of which were sold between 2006 and 2010. However, there are 80 explanatory variables with a combination of continuous, discrete, and categorical variables. The categorical variables posed an initial challenge and I am henceforth eternally grateful for Christopher Brown’s “dummies” package in R which saved me hours of manually coding categoricals into dummies.

In terms of continuous and discrete variables, notable variables included lot area, above grade living area, basement size, number of bedrooms, year built, and overall quality and condition. Categorical variables included sale type and condition (such as foreclosure or family sale), neighborhood, and building type (such as single family or townhouse). And, of course, we have our dependent variable: sale price.

As with most real estate data, some data transformations are necessary. From the real estate data I’ve worked with in the past, I’ve noticed that housing prices and living and lot areas are best understood with natural logarithmic transformations. These data tend to be right-skewed by large, high-value properties that, if I had to guess, include mansions and farmland. Plus, log-log models still provide for easy interpretation — even if I have to look it up just about every time. A model that looks like

ln(Sale Price) = ß0 + ß1*ln(Lot Area)

indicates that every 1 percentage increase in “Lot Area” is associated with a ß1 percentage increase in “Sale Price”. And, as a newly-minted apartment lessee in downtown DC, I’m quite certain that an extra 50 square feet are a lot more valuable to a broke Iowa State student than a corn farmer in North Ridge Heights

Starting with the sale price, we can certainly see evidence of right-skewness. The mean house price in Ames is $181,654.90, while the median price is $163,945. This gives us a skew of 1.9, which in economic academia, is known as gross. Of course, there’s no need to worry since we have our hero, natural log, to get everything back to normal.

Much Better.

Our new mean is 12.01 log-dollars with a median of 12.03 log-dollars. And skewness is at a much more palatable 0.23.

The histogram for lot area is just ugly. There’s a tight grouping of properties around the median of around 9,500 square feet. And then there are some monstrosities that can only be effectively described in acres (our max lot area is about 4.5 acres). Those properties give us a skew of 12.15. Gross. But, the natural log transformation normalizes our distribution and skewness is down to -0.14 with a mean and median of 9.11 and 9.14 respectively.

Like lot area and sale price, above grade living area (ABLA) gets the natural log treatment. The mean ABLA is 1517.7 square feet with a median of 1466 square feet and a skew of 1.37. Again, gross. After applying a log transformation, we have a mean of 7.27 log-ABLA and a skew of 0.01.

Before starting a more in-depth analysis I want to take a quick look at two charts. The first is a scatterplot with ABLA and sale price, and the second is a scatterplot with log-ABLA and log-sale price.

Just applying the eye test shows us that the log-model has a tighter grouping of points which should indicate lower residuals and thus a better model. Of course, no one has won the Nobel Prize in Economics with the eye test, so I ran two simple regression models. Using R² as a diagnostic, the linear-linear model records an R² of 0.5022 while the log-log model gives us an R² of 0.5364. The log-log model wins out.

Also, side note, but I want to see some of these houses in North Ridge. Seriously, $750,000 houses in 2006–2010 in Iowa? I have questions and I want answers, Ames.

Analysis

In my article about NBA salaries, I discussed hedonic pricing models. In case you missed it, hedonic pricing models allow us to breakdown the prices of heterogeneous goods like residential properties by analyzing the impact of individual characteristics of that good.

When you’re at Trader Joe’s picking out hormone-free hot dogs for a Fourth of July cookout, the unit price is probably the determining factor in which brand you select. Rational consumers want to pay less for very similar products. However, there is no single demand curve for houses the way there might be one demand curve for hormone-free hot dogs. Rather, real estate valuation can be broken down into the relationships between demand-generating characteristics such as size, number of bedrooms, and age of the building.

So, we first need to pick out a selection of characteristics that determine housing prices in the tropical paradise of Ames, Iowa. Since I already determined that log-ABLA accounts for 53.64 percent of the variation in log-sale price, that’s a good place to start.

Model Set 1

I started by just looking at the relationship between lot dimensions and price. Adding basement and garage area indicators helped boost our R²-value by a decent amount, but there is still plenty of variation to account for. So, back to the drawing board.

Model Set 2: Miscellaneous

Before choosing my final pricing model, I had some miscellaneous questions about the Ames housing market. First, I wanted to know what housing prices are like across each towns neighborhoods, both controlling and not controlling for house characteristics. The first model, not controlling for characteristics, gives us average prices for houses in each neighborhood. The second model should be able to tell us which neighborhoods actually provide a value-add. If a neighborhood has a significant, positive relationship with the sale price, controlling for house-level characteristics, then we can conclude that the neighborhood adds value to the property.

Our big winners in this model are North Ridge, Stone Bridge, and North Ridge Heights, all of which have significant coefficients, indicating average housing prices over $300,000.

Brookdale and Meadow Village housing prices pale in comparison with average house prices around $100,000

Our second model, controlling for house-level characteristics gives us more insight into the value-add of each individual neighborhood. Note that the dependent variable here is log-sale price while in the previous model the dependent variable was the standard sale price.

Controlling for lot size, ABLA, overall quality, and other characteristics, several neighborhoods have significant value-adds.

The most notable winners are Somerset, Stonebridge, and the North Ridges. These four neighborhoods all have significant, positive coefficients indicating that these neighborhoods provide a value-add to property values given a certain size and quality.

I also wanted to see what impact, if any, the Great Recession had on Ames’ property values. Bloomberg Business named Ames one of the “15 Cities That Have Done the Best Since the Recession” and I wanted to see if housing prices reflected that since these sales occurred between 2006–2010.

Nothing.

Not even a little. The intercept represents 2006 housing prices. The slight decrease in housing prices in 2008 is not significant at any meaningful level of confidence. Looks like Bloomberg got this one right.

Final Model Set

After toiling away at log-transformations and coding categorical variables I have finally arrived at my final two models.

Now I know this table rounds to two decimal places, but Model 1 actually has an adjusted R²-value of 0.8726 compared to Model 2’s adjusted R²-value of 0.8717, so Model 2 is officially my final model for predicting housing models in Ames, Iowa. And there are certainly some interesting findings to discuss.

First, there is a negative relationship between bedrooms above grade and log-sale price. While I can’t hope to give an exact explanation as to why this relationship exists, my intuition tells me that bedrooms are a necessity when considering housing. A buyer only demands enough bedrooms to house themselves and their family/friends. Therefore each additional bedroom occupies space that could instead function as a den or office, in which to read my articles.

Further, the sale type and condition are important factors to consider in evaluating home prices. Newly constructed homes sold for more on average than houses being resold.

Houses sold between family members or occurring under abnormal circumstances such as foreclosure sold for less than sales under normal conditions.

Townhouses sold for less than single-family homes, even when controlling for size, number of bedrooms, and other characteristics. However, the interaction term between log-lot area and townhouses indicates that the coefficient for the log-lot area for townhouses is actually around 0.16 as opposed to 0.09 for single-family homes.

Finally, and perhaps most shockingly, pools are negatively related to sale prices. Okay, maybe I shouldn’t be as shocked as I am. While Iowa summers tend to be hot and humid — an ideal climate for pool volleyball — winters are cold with January temperatures averaging about 14 degrees Fahrenheit, boosting the costs of pool maintenance for half the year. But still, come on Iowa.

Conclusion

Well, it has sure been a journey. I hope you enjoyed this little tour of small-town America and its apparent disdain for pools.

--

--

Sean Holland

Consultant - Deloitte Consulting | George Washington University Class of 2020 | International Affairs and Economics