How much did the Second Avenue Subway increase home prices?

Measuring the impact of new infrastructure on real estate

Published in

data.tale()

8 min readDec 18, 2017

Photo: MTA Capital Construction / Rehema Trimiew

A little over a year ago, the first phase of the new Second Avenue Subway finally opened to great fanfare. Aside from a new train line, the opening also brought increased housing prices along for the ride — rents went up, as did home values. StreetEasy put out a great post on increasing rents before the line opened. We want to conduct a similar analysis, however, instead of looking at rents, we will focus on housing prices.

Why the Second Avenue Subway?

The Second Avenue Subway line’s impact on real estate prices is interesting for a few reasons. First, East Harlem was just rezoned — in part to prepare for the next extension of the Second Avenue Subway line. By understanding what happened in Yorkville, we may better understand how property values will change in East Harlem. Personally, having lived in the neighborhood (at 116th and Lexington), I know that a new subway line will dramatically improve the area’s accessibility — which will, without a doubt, bring a lot of change. Quantifying that change using something easily measurable — real estate prices — is an important first step to understanding the long term changes that the new subway will bring.

Second, quantifying the value increase brought about by the subway line is useful for potential value capture strategies for funding infrastructure projects. If the city understands how much money real estate owners will make from proximity to a new subway line, then the city can take a cut of that value increase for itself. Of course, this is only one step toward helping fund new transportation projects — but if something like the Second Avenue Subway could pay for itself, then maybe New York State Governor Cuomo could funnel some of that money into the basic maintenance the subway system so desperately needs.

The data and data limitations

The data for this project is easily accessible:

Department of Finance sales data. This dataset records each property sale that happens in NYC, including residential, commercial, etc… — as well as condos, co-ops, and other apartments. We restricted our analysis to residential apartment sales.
PLUTO. This dataset contains basic geographic and zoning information about each tax lot in the city, among other pieces of information (such as when the building was built, renovated, etc.). We use this to get exactly how far each sold unit was from the subway line.
Zillow data, to get some meta-information about each property. Sarah put together a Jupyter notebook that scrapes the API for us.
Some subway GIS data to tie it all together. We did this work in ArcGIS, though it wouldn’t be hard to replicate in GeoPandas, my preferred geospatial analysis library.

We had two main issues with the data:

The DoF sales data is messy. For starters, it’s a bit inconsistent with the way it labels apartment numbers. Additionally, (we think) it includes things like deed transfers, e.g. when someone passes ownership of an apartment on to a family member, as low-valued sales.
Zillow is not free, and when attempting to pull down information on the apartments we were able to uncover in the DoF data, it was inconsistent with what information we could get back.

Distance to the subway stops after the first phase of the Second Avenue Subway (SAS) opened on January 1, 2017.

Working through all this, we were still able to get ~1700 sales records from the past ten years — January 2008 through October 2017. This left us with a dataset of sales with the following baseline features:

Sale date
Sale price
Distance to nearest Second Avenue Subway stop
Building year built / renovated
Number of bedrooms in apartment

This image shows the area that we’ll consider, along with the subway stop locations, and finally, the distance of each lot (where again, lot boundaries are given by PLUTO) to the nearest subway station.

If you want to dive into the data loading and munging, it’s all on github.

The methods: experimental design

This may have been my favorite aspect of the project. I haven’t spent a ton of time digging through the nuances of experimental designs, so it was fun to learn more. I had exposure to the following, traditional, experimental setup, which we’ll build off of:

A group of people is partitioned into “treatment” and “control” groups.
The “treatment” group receives some sort of “intervention” — think, a new drug to lower cholesterol — and some “outcome” is measured — think, cholesterol levels.
The “control” group does not receive the “intervention”, but they have the same “outcome” measured as for the treatment group. In a drug study, the control group would likely receive a placebo pill and instructions to measure their cholesterol in the same way as the treatment group — the idea being, there is absolutely no difference (in aggregate) between the control group and the treatment group, aside from the fact that the treatment group is taking pills that are specifically designed to lower cholesterol.
The change in measured outcome is compared between treatment and control group — typically as an aggregated value. In the case of the cholesterol drug, we’d compare the mean cholesterol levels before and after treatment, and see which group experienced a bigger change.
Because of the way we set up the treatment and control groups to be identical aside from the contents of the pill they are taking, if the treatment group had lower cholesterol, we can conclude that the drug itself had an impact on the treatment group (or did not).

In the social sciences, things are a little different. Setting up an “experiment” around a question like the impact of a subway line on real estate values is tough, because we can’t explicitly control in the traditional sense (as in the drug study example above). We can’t separate out some apartments and deny those units access to the Second Avenue Subway, and then measure how their values are impacted compared to those that do have access. Instead, we have to be a little bit more clever.

Our team’s digging came up with a few possible experimental designs. For more detail, you can refer to the original post. Here, I’ll summarize the most effective method we used.

Design: repeat-sales

One popular quasi-experimental design for analyzing changes in real-estate values is repeat-sales. The idea here is that rather than comparing sales and controlling for features that may affect sale price (e.g. apartment square footage), we can just look at the changes in sale price of the same apartment. If we think about this as a regression, where we include factors that control for substantive features, by taking the delta, the substantive features drop out of the equation and we’re just left with time-based changes.

As an example (with unrealistic numbers), let’s say we develop a linear model for housing price:

price = $300,000 * (# of bedrooms) + $200 * (year of sale)

And then, suppose we use this model to look at the sale of a 2 bedroom apartment in 2007 and 2010. In particular, we can look at the change in price with our model:

price(2010-2007) = $300,000 * (2–2) + $200 * (2010–2007)

Because we’re looking at the exact same unit, only time-based features, such as the year in this case, remain in the model of the price growth. This greatly simplifies our assumptions.

This shows the change in distance to the subway stations after the Second Avenue Subway opened. (Note: ignore the washed out area — it’s incorrectly colored.)

In our case, we looked at the “distance improvement to nearest subway” between the two sales. That is, if the first sale in a pair occurred before the line opened and the second sale occurred after, we took a look at how much shorter that apartment’s walking time to the subway was now that the Second Avenue line was open — as compared to their old walk to the 4/5/6 line. Using this method, areas on the far east side of Yorkville will see the greatest price jump, as they have the most dramatic distance improvements.

Looking at the map to the left, we can actually see the time-based variable we’ll care about in the repeat-sales approach. In particular, this heatmap illustrates how much the subway’s opening improved a given property’s transit access. Red is “most improved” (so as we can see, properties on the far east side of the island gained the most in terms of travel distance improvement). Interestingly, the “green” areas on the map are actually still closer to the nearest 4/5/6 train stop.

This graph shows the change in distance on the X-axis, compared to the change in sale price on the Y-axis. Many properties in our area of interest experienced a change of 0 with the line’s opening, since they are already closer to the 4/5/6 line. However, there is a clear, though small in magnitude, upward linear trend for those properties which did get a walking distance improvement. We’ll quantify this improvement in our results below.

The results

We saw a statistically significant indicator that the SAS has increased property values. In particular, for every 100m of walkability improvement a given unit gained with the SAS opening, its growth factor increased 2% on average. This translates to 8% above-average growth for the units on the far east side of Yorkville, which got a ~400m walkability improvement to the nearest subway. So according to this model, a unit that would have sold for $400,000 at 86th and York is now worth ~$432,000 … just by virtue of this subway line opening!*

With that said, we had some methodological issues. In theory, with repeat-sales, we should be able to account for the variation in value increase to a high degree. A similar study in Montreal reported R² values of .7 for a baseline model — ours were significantly lower. I’m not sure why this is — perhaps different sized apartments are fundamentally different markets in New York, and we can’t consider studio apartments in conjunction with 2-bedroom apartments, for example.

Conclusion

You don’t have to be a real-estate professional to know that the Second Avenue Subway has jacked up values in Yorkville. We’ve quantified exactly how much transit accessibility improves property values — and we find that accessibility has quite a substantial impact. We can use this information to better fund, plan, and develop new infrastructure around the city, especially as the city begins to develop the next phase of the Second Avenue line’s opening. We hope it may be useful for policymakers to better understand the implications of these infrastructure decisions.

But what about renting?

Something we haven’t looked into, but are very interested in, is rental prices. Do rent increases behave differently than sale price increases? Are there anticipation effects? Do we see rents increase more immediately due to the shorter overall timescale of renting? Unfortunately, rental data is hard to acquire, but given access to the data, we would be very interested in continuing this research.

If you want to learn more, there’s a notebook where all of this analysis is captured.

Many thanks to the best teammates ever, Sarah Schoengold and Hao Xi, for working on this project! This was truly a group effort.

Originally published at www.christianmoscardi.com on December 18, 2017.