You should have bought that condo in 2012 and you should have bought it in Oakland, plus other findings about U.S. home value over the past four years
After thinking about buying a small condo in Chicago last year (I didn’t) and then watching The Big Short last month, I became fascinated with the U.S. housing market. There have been a lot of discussions about the housing bubble that mushroomed spectacularly until 2007 and then burst equally spectacularly afterward. There doesn’t seem to be as much analysis, though, into what our housing market has been up to in more recent years. So I decided I’d do a little digging on my own.
This project is made possible by Zillow, which has kindly opened up its data for public use. I tapped into multiple Zillow datasets, with home value information for every month over the past twenty years, spanning the national, state, city, and neighborhood level. As I’ll describe below in the Learnings section, having access to massive, multidimensional data is both a blessing and a curse. I ended up spending much more time figuring out what questions I wanted to answer with all this data than doing actual analysis.
To help me focus, I narrowed the scope of my analysis to the following:
- I looked only at 2-bedroom condos. Mostly for selfish reasons. If I were to purchase a place, a 2-bedroom would be the most likely type.
- I concentrated on growth rate instead of the absolute value of housing prices, as for the purposes of this specific project I was primarily interested in how home values have changed over time (and there already seems to be a lot of studies on different cities’ absolute housing values). So for example I’m not digging into how high San Francisco’s housing value is (we already know it’s super high). Instead, I’m interested to know whether San Francisco’s housing value is rising faster or slower than other cities’.
- One limitation here is Zillow doesn’t have data on every single state, city, and neighborhood in the country. For example, it’s missing data on Texas and Louisiana altogether for some reason but includes Washington, D.C. separately. It also has limited information on neighborhoods in Detroit and a few other cities. That said, the datasets are still super thorough overall.
To get an initial lay of the land, I took a look at how our median national home value has changed in the recent past. (Note that for this exercise and everything following in this post, I’m using “home value” and “housing value” synonymously and they refer to the financial value of 2-bedroom condos only.)
The left part of this graph is probably no surprise. The country saw massive increases in home value, hitting a peak in March and April 2007 when the median value was $166,500. It goes haywire from there, pretty much dropping nonstop for five years straight. For everyone who graduated college or owned a home in that period, I’m sorry about bringing back all your bad memories. The housing market hit a low in February 2012, when the median U.S. home value was $116,000 — a 30% decrease from the heydays of spring 2007. Cue video clip of Ryan Gosling’s character in The Big Short pulling out Jenga blocks until the tower topples over, as an analogy for the U.S. housing market.
But the right side of the graph looks promising. Since 2012, the U.S. housing market seems to be on the rise again. In the past four years, the median housing value in the U.S. has gone up about 26%, to $146,800 as of June 2016. That’s just ~$20,000 short of the high peak we reached as a country prior to the bubble bursting in 2007.
Optimists and Barack Obama are cheering the recovery of our economy. Critics are calling it the beginning of another bubble. Either way, you probably should have bought that condo back in 2012 when your parents told you to.
Based on this graph, I wanted to further explore: how evenly distributed is this 2012–2016 growth across the U.S.? What parts of the country are the biggest winners? Are they the same winners as those from the pre-2007 bubble? What regions are growing most equitably? Let’s take a look. (Note: for the purposes of this project, I’m defining “2012–2016” more specifically as the four-year period from July 2012 through June 2016.)
This growth is not evenly distributed. Florida, Michigan, and the West have benefited the most.
The rise in home value over the past four years has benefited some parts of the country a lot more than others. 61% of U.S. states are seeing moderate growth in the 0–20% range, but two states’ housing value is actually declining while 22% of the states are experiencing massive growth over 30%.
Who are these big winners and losers? Turns out the western part on the country is experiencing the most dominant growth. Florida and Michigan are also winning big. Nevada leads the nation with an impressive 83% growth in median home value in 2012–2016, followed by Florida (62%), California (60%), Colorado (55%), Michigan (51%), Oregon (45%), and Washington (42%).
The Great Plains area, midwest, and eastern part of the United States are seeing relatively more sluggish growth, with Maine actually declining by 19%. This puts a whopping 102% difference in home value growth rate between our fastest-growing state and slowest-growing state. Choose wisely when you buy, indeed.
The biggest winners of 2012–2016 are also the biggest winners of 2003–2007.
Interestingly, a similar roster of states seems to have reaped the biggest benefits from both of our most recent periods of growth. States that were growing fast in home value in 2003–2007 also tend to be growing fast now, and states that were growing slower in 2003–2007 tend to be seeing slower growth rates now as well. This correlation is actually statistically significant (for ye statisticians, the p-value is 0.02).
From this chart we can see that Nevada, Florida, and California are the three biggest victors of both 2012–2016 and 2003–2007. States like Indiana, Ohio, New Mexico, and Maine didn’t fare so well in either period. Most other states followed a similar pattern. Colorado is one of the few exceptions—it’s experiencing the 4th highest growth rate in the U.S. in 2012–2016 but saw a dismal growth rate in 2003–2007.
The other insight from this chart is that most states experienced more aggressive growth in 2003–2007 than in 2012–2016. Whether or not that’s a good thing is yet to be determined, given that explosive growth in home value doesn’t always mean rainbows and butterflies (as we saw in 2007).
The faster a state is growing, the more unequal that growth is.
It turns out the home value growth rate of a state is also related to how unequal that growth is within the state. In other words, the faster the state’s median growth rate, the bigger the difference between the growth rate of the state’s fastest-growing city and slowest-growing city.
Let’s take our fastest-growing state Nevada as an example. Median housing value in the Silver State (that’s how Nevada describes itself accordingly to its license plates) is rising at 83%. But that growth is very unequally distributed among Nevada’s cities. There is a difference of 120% between Nevada’s fastest-growing city Fernley (whose median home value is rising at a breathtaking 131%) and its slowest-growing city Laughlin (growing at just 11%). Contrast this to Alaska, whose median growth rate as a state is a humble 9%, but it’s also growing more equitably. The growth rate difference between Alaska’s fastest-growing and slowest-growing city is just 13%. There appears to be a similar trend with other states — the faster a state’s median housing value is rising, the wider the spread in growth rate among the cities in that state. And vice versa.
The statistical significance of this correlation is even stronger (p-value is < 0.0001). From the chart above, it’s also interesting to see that basically no state falls in that magical lower-right quadrant of “Growing fast as a state while keeping inequality among cities low.” New York and Maine have the distinct honor of stagnating (or declining) in median state housing value while also having high inequality among their cities. California is actually the most unequal state, with a spread of 159% between its fastest-growing city (Vallejo, growing at 158%—located in the Bay Area) and slowest-growing city (Bishop, declining at 1% — located about halfway between Yosemite and Sequoia National Park). Rising tide is definitely not lifting all boats equally; in some cases it’s not lifting certain boats at all.
It’s probably not shocking to anyone that what city you buy your condo in matters a ton from an investment perspective (and other perspectives). But it was surprising to me just how huge of a difference it could make, even among cities in the same state. Naranja, Florida? Jackpot. 117% rise in value in the past four years. Havana, Florida? Not so much. Your 2-bedroom condo has actually lost 20% in value between 2012 and 2016. Ouch.
You think San Francisco’s housing market is out of control? Take a look at Oakland.
To see how these trends play out on a more micro level, I dug into seven cities in the U.S. that I’ve lived in at some point or gotten to know on a deeper level. I was curious to know what the growth trajectory looks like for these cities and how unequal that growth is among neighborhoods in the same city. Here are the magnificent seven: Los Angeles, CA; Oakland, CA; San Francisco, CA; Chicago, IL; Detroit, MI; Boston, MA; New York, NY.
Before I show a box-and-whisker plot below about our seven cities, I’ve included an explanation on how box-and-whisker plots work in general for folks who are newer to them. I certainly didn’t know how to read these prior to taking my data analytics class. Feel free to skip this explanation below if you’re already a stats expert.
A box-and-whisker plot usually looks something like this image below. There are five horizontal lines in the chart (they would be vertical lines if the plots are rotated 90 degrees as they sometimes are).
The line at the very top represents the maximum value in the dataset. Imagine, for example, that we have a dataset with the current age of the Backstreet Boys — 42 (for Howie), 36 (for Nick), 44 (for Kevin), 41 (for Brian), and 38 (for A.J.). In this example, the maximum value would be 44. Kevin has come a long way from his boy band days. The line at the very bottom represents the minimum value, so that would be 36 in our Backstreet Boys example.
To figure out the value for the other three lines, we need to rearrange the Boys’ ages in order, so something like 36, 38, 41, 42, 44. Then, the median is the value that’s in the middle or halfway from either end of the dataset (so 41 in this case), the 1st Quartile is the value that’s one-fourth of the way from the smallest item in the dataset (so the 1st Quartile would be 38 in this case), and the 3rd Quartile is the value that’s one-fourth of the way from the largest item in the dataset (so the 3rd Quartile would be 42 in our example).
So in one chart, you’re able to see the max, min, and median, plus you get a visual sense of the spread between max and min and where most of the items in the dataset fall between max and min. Pretty powerful stuff.
That’s my attempt to show you the meaning of being box-and-whiskered. But now it’s time to quit playing games with my post and get back to the subject at hand.
Let’s look at just the median growth rates for a second, which are represented by the horizontal line in the middle of the rectangle for each city. That’s what this chart is sorted by in ascending order from left to right. What this shows is that Oakland’s median home value growth rate (taking into account the growth rate of all the neighborhoods within Oakland) is the fastest among these seven cities in 2012–2016. The interactive version of this chart goes into more details, but to be precise, Oakland’s median growth rate is 106%, which is almost 2x the median growth rate of San Francisco (61%), 4x the national median growth rate (26% as mentioned near the beginning of this post), and 4.4x the median growth rate of Chicago (23%). Chicago has grown the slowest among these seven cities and is the only one that’s below the national median. One caveat here is that Zillow has very limited data on Detroit, so the Motor City’s results might be skewed.
The fun (and sometimes confusing) thing about box-and-whisker plots is that they can show a variety of metrics within one chart. Let’s take a look next at the spread between max and min, which represents the growth rate difference between each city’s fastest-growing neighborhood and slowest-growing neighborhood. In other words, it shows how unequal the home value growth has been within each city. Here, Oakland “wins” again. Oakland not only has the highest median growth rate, but it’s also growing most unequally, with a 129% difference between its fastest-growing neighborhood (Harrington—by the Fruitvale BART station) and slowest-growing neighborhood (Lakewide—between Downtown Oakland and Lake Merritt). Financially-speaking, this means that Oakland is a market where your return on investment could vary especially dramatically depending on which part of the city you bought your condo in. The good news is that every neighborhood there has seen at least positive growth in 2012–2016.
Here’s how the spread between max and min looks in all seven cities:
- Oakland: 129% spread (fastest: Harringon | slowest: Lakewide)
- Los Angeles: 71% spread (fastest: West Adams | slowest: West Hills)
- New York: 68% spread (fastest: Williamsburg | slowest: Great Kills)
- Chicago: 65% spread (fastest: Old Irving Park | slowest: East Hyde Park)
- San Francisco: 59% spread (fastest: Bayview | slowest: Nob Hill)
- Detroit: 44% spread (fastest: Grandale | slowest: Warrendale)
- Boston: 25% spread (fastest: East Boston | slowest: West Roxbury)
Another thing to note is that San Francisco’s home value growth is actually less unequal than that of Oakland, LA, New York, or Chicago. This was surprising to me, given SF’s reputation for having almost inhumane-level prices. Turns out while San Francisco might have extremely high absolute home values, those home values have actually grown more slowly compared to cities like Oakland, and SF’s growth has actually been distributed more equally among its neighborhoods compared to many other major cities.
Finally, in case you’re wondering: the fastest-growing neighborhood in the whole country (that Zillow has data for) is, in fact, Harrington in Oakland, California, whose housing value has increased 168% in the 2012–2016 period. The U.S. neighborhood that’s grown the slowest is Weston, located in the city of Winston-Salem in North Carolina, which has actually declined by 31%. That’s almost a 200% difference from Harrington!
Takeaways, recommendations, further exploration, learnings
To summarize the biggest takeaways from this analysis:
- U.S. home value has increased a lot over the past four years. That rising tide has lifted most but not all boats, and it’s lifted them very unequally. This is true on every level—national, state, city, and neighborhood.
- The biggest winners in this period of growth have been Florida, Michigan, and the West. Among the seven cities I dived deeper into, Oakland is growing by far the fastest, followed by San Francisco.
- The regions of the country seeing the slowest growth (and sometimes negative growth) in 2012–2016 have been the Great Plains area, the Midwest, and the East. Among the seven cities I dived deeper into, Chicago’s median growth rate has been the most sluggish.
- There is a strong, positive correlation between how fast a state has grown in 2012–2016 and how fast it grew in 2003–2007.
- There is an even stronger, positive correlation between a state’s growth rate in 2012–2016 and how unequal that growth has been within the state (the faster the growth, the wider the difference in growth among cities in that state).
Recommendations I’d make based on this project:
- If you’re thinking about buying a home partly for investment purposes, make sure you do research into growth trends on the state, city, and neighborhood level. There could be wide variations on every one of those levels. This is probably not shocking.
- A similar roster of states (e.g. California, Florida, Nevada) have done very well in both periods of recent growth. More longitudinal analysis is needed but this could mean that these are good states to consider investing in especially when we’re about to enter another period of housing market growth.
Further exploration that could add additional color to this analysis:
- Regarding the correlation between a state’s growth rate and the inequality of that growth—does the same trend hold on the city level? In other words, is there also a strong positive correlation between how fast a city’s home value is growing and how big the difference in growth rate is among that city’s neighborhoods?
- Similarly, does the correlation between growth rate in 2012–2016 and growth rate in 2003–2007 also apply on the city level?
- Is there any relationship between how quickly a region is growing in 2012–2016 and how quickly that region declined in the post-2007 recession? Are certain states simply more volatile (win big in times of growth and lose big in times of decline), or have the same states been “winning” (/declining the least) through each period?
- Why I want to learn more data science: I would love to be able to create a model that predicts which states/cities/neighborhoods will grow the most in the future.
This has been the most time-consuming data analysis project I’ve done to date. Here are some of my biggest learnings from the process:
- The more data you have, the more disciplined you need to be about defining the focus of your analysis upfront. I spent about 20 hours total on this project, and 10+ of those hours were spent going down analytical rabbit holes, changing my mind about what direction to go in, and then going down more rabbit holes related to the new direction. Over the over and over again. The sheer volume and complexity of the Zillow datasets didn’t help, as they presented a lot of distracting “shiny objects” AKA possible questions I could have answered. I became an analyst without direction, spending a lot of time excavating answers with Excel and SQL before I knew which questions I really wanted to find answers for. A much better process would be for me to (1) get some initial, high-level understanding of my data, (2) decide what questions I most want to answer using my data and limit the number of questions to just a few (this step is usually already done for you if you’re doing data analysis for a company or class), and (3) do the actual analysis to get answers to those targeted questions (which is almost never the hardest part). And reiterate through those steps as the need arises.
- Tools matter. In trying to figure out what regions of the country grew the fastest in 2012–2016, I first attempted to plot growth rates on a map of the United States by hand, using the custom Google Maps app. The app is fantastic in so many ways, but Tableau can (and ultimately did) get the job done in one-tenth of the time. Same goes for making scatterplots in Excel versus in Tableau.
- Medium and Tableau are not friends yet. This was the first data project when I used Tableau extensively for my visualizations, for the reasons mentioned above. Tableau visualizations are fast, slick, and interactive. But unfortunately you can’t embed them into Medium posts yet. I ended up having to screenshot my visualizations in order to add them into this post as still graphics. So be sure to take a look here as well for the original, interactive visualizations!