Estimating Census Population Margins of Error

And What it Means for DC’s Budget

Regular readers know that I really like to pry into the revisions of Census population numbers and see exactly how they work. However, one thing that’s always bugged me about Census numbers is that they don’t have a stated margin of error. Yes, they have residuals estimates, but that’s not quite the same as a margin of error.

Luckily, because Census publishes their whole time series every time they release a new estimate, we can do a direct measure of historical error. We can just compare Census estimates to Census data and to the “final” intercensal estimates, which I assume to be accurate, and see how large the errors are.

For today’s post, I want to accomplish two things, and they’re both basically about trying to help my own local government understand the population and demographics data that exists about DC. First, I want to look at the historic rates of error in forecasting DC’s population. Second, I want to look at DC government’s population forecasts, and question whether they’re appropriate.

DC’s population has shown substantial variation over time, so it may be harder-than-usual to predict versus localities with more stable growth patterns. But while that may lead to an overestimation of total error, it’s not total error in Census data I want to measure. I want to know error rates around periods where population trends may be disputed or interesting, that is, the error rate for hard-to-predict cases is the only error rate that’s really interesting. That being the case, DC is a pretty good case to look at.

Unfortunately, Census’ population estimates are not reported in an easily usable format before 2000. And we’ve only had one full Census round since 2000, so I can only look at a single analytic window: Census estimates between 2000 and 2010.

But how will we measure error? The core Census estimates don’t include forecasts of 2020 Census results, just current- and past-year estimates. What do we make of that?

Well, we can make a leap of faith and assume that intercensal estimates, usually produced 2–5 years after a Census for the decade preceding that Census, are probably correct. These estimates have the benefit of a maximally complete dataset for the whole period, and the benefit of knowing the endpoint, the Census. So if we treat the final intercensal estimates as gospel even though they’re technically still just estimates, not actual Census results, then we can get a “true” population figure for each year, versus Census estimates of that year for each vintage.

As you can see, Census was clearly of a divided mind about DC. From 2002 to 2005, Census was downgrading DC growth. Then they made a huge revision in 2006, showing fairly steady upward movement in DC population. They continued this optimism through 2009.

What we see in the intercensal estimates is that Census first overstated then understated the decline to 2005, but then overstated population from 2005–2009 while actually understating growth rates. By 2010, Census had probably caught up and was about right in time for the 2010 Census.

As you can see, these revisions are large. Changes in population due to revisions sometimes exceed change in population due to estimated growth or decline. So if you don’t pay attention to the revisions, you can seriously misunderstand what’s going on here.

So, okay, how big are these errors? Well, if we compare each year-vintage point estimate to the intercensal year estimate, we get an error value. If we then divide that error value by the estimate itself, we can get an error rate. We divide by the erroneous estimate, not the intercensal estimate, because we want a value that we can use to approximate errors for more recent population estimates, and we don’t have a priori knowledge of intercensal estimates. I won’t bore you with the chart, but yearly errors range from 0.12% to 3.02%. The average error is about 1.61%. However, what we really want is to know a band of population within which the intercensal estimates are definitely going to fall; so we want to know, say, the 95% cutoff of error rates. Well, I observe 44 error rates here, so I want to drop the most extreme 5% of error rates, so we’ll say the 3 highest error rates. That’s 3.02%, 2.57%, and 2.56%. The 4th highest error rate is 2.55%. So we’ll say that it’s a pretty darn good chance that the final intercensal estimates will come within 2.55% of any given year’s actual estimates.

So let’s take our 2.55% error rate, and apply it to post-2010 population estimates for DC.

First off, look at the red lines around the center, and the black/gray lines. These are the actual population estimates. As you can see, the post-2010 period thus far has been much easier to forecast, with fairly steady growth under all projections, though the exact pace of growth has varied. So we may reasonably expect that the error rate will eventually come in under 2.55%.

But if we want to be on the same side, we can apply our 2.55% error rate anyways. And here we see that the realm of potential error is vastly greater than the range of revisions. It is not unreasonable to think some future Census estimate could spit out a very different figure for DC. At the same time, the observed growth is robust enough that we can say fairly convincingly that DC in 2016 has more people than DC of 2012: the minimum estimate for 2016, 664,000, is above the maximum estimate for 2012, 617000. We can be extremely confident that DC is growing.


Well, duh, you say. Of course we can be confident in that. Who doesn’t think DC is growing?

Well, before I get to that, let’s think about this growth. We can say for sure that DC is growing. But we cannot say with certainty that growth has been greater than 12,000 people since 2012. Now, the largest growth estimate is 82,000 people since 2013, with the core population estimate at 46,000. These differences matter. This is equivalent to saying population has grown somewhere between less than 2% versus almost 13%. That’s our error range here. Even if we use a much more conservative estimate of error, like maybe 0.5%, the range of estimates of growth since 2012 is 39,000 to 55,000. A 16,000 person error range is sort of a big deal.

How many housing units do 16,000 people need? Well, my guess is somewhere between 8,000 and 16,000, and that assumes that we allow the vacancy rate to creep lower (to maintain a fixed vacancy rate with a rising population means you have to build new vacant apartments, or allow a larger absolute number of old apartments to be vacant).

So, let’s compare housing units added in DC versus population growth.

Let’s just look since 2010 for simplicity. The chart below shows low, medium, and high estimates of the amount of population added since 2010:

So what do we see here? Well, population growth definitely outstripped permitting of new housing, and that’s not even accounting for any housing that depreciated beyond use or was converted to commercial use. If we take a low estimate of population growth, we get 2.2 people added per unit permitted. If we take a medium estimate of population, is 2.9. With a high estimate of growth, it’s 3.55.

Meanwhile, here’s DC’s housing vacancy rate:

The vacancy rate is falling. Indeed, from 2010 to 2015, DC added between 48,000 and 84,000 people, and 21,574 building permits. Yet ACS shows only about 13,000 net increase in housing units. Rather, vacancies fell, with occupied housing rising by 29,000 units: so between 1.63 and 2.86 people per increase in occupied units.

But is a 6.8% vacancy rate low or high?

Well, for the nation on the whole, the vacancy rate in 2015 was about 7.2%, so a bit higher than DC’s. To me, that says that a 6.8% vacancy rate is on the lower end, but not extremely low. It’s possible that DC can add new housing by pushing the vacancy rate even lower.

But we have population and housing data for 2016. It is likely, although not certain, that DC’s population rose in 2016. The main population estimate is of growth of 11,000 people. If we assume they need 1 unit per 2 people, that’s 5,500 units. Yet in 2016, only 4,700 housing units were permitted. We can assume that some fraction of the housing supply was demolished or converted as well, so it’s likely that somewhere south of 4,700 housing units were actually added. If we lag by a year, 2015 added 5,000 housing units, but, again, we’re still hundreds of units short. That means vacancies are likely to fall 300–1200 units, depending on your assumptions about people per unit, permits-to-units-constructed, etc. That means that the vacancy rate in 2016 was probably between 6.3% and 6.7%. If it’s 6.3%, that’s starting to get pretty far below the national average.

In other words: in recent years, DC didn’t produce enough houses.

No surprise then:

Rent estimates are still rising as of February 2017.

Notably, this isn’t true everywhere. DC compared to a few other cities.

Everything here is indexed to January, 2016. As you can see, SF, Chicago, Houston, and New York have all seen declines in rents, suggesting that, for whatever reason demand and supply balances have been such that there are fewer circumstances where prices get bid up, and more where they are allowed to fall. I’m using circuitous language here to avoid a definitive statement about whether this is a supply or demand shock.

The point is: of the cities that are facing weak price conditions due to oversupply of housing or declining demand, DC isn’t a gold-star case. At least as of 2016, the data I’ve shown thus far suggests that it’s unlikely to be such a case.

The 2018 DC budget outlook says:

And:

So, um, what? DC is growing too slowly to occupy housing under construction or planned? That seems… like an interesting prediction?

So let’s look at what’s going on here. Here’s DC population and DC’s budget forecast for population:

On face value, this forecast looks fine. Growth continues, but it slows down. The budget calls this a “cautious” outlook, and it is cautious compared to recent years. However, I do not believe it is sufficiently cautious. DC essentially forecasts that growth will smoothly glide towards rough stability around 0.7% or 0.8% per year. Maybe so. But international and domestic migration are likely to fall given increasing suburbanization, aging of Millennials, less growth in Federal hiring, even possible relocation of some Federal agencies or functions. Birth rates also seem unlikely to rise. Here’s the same chart as above, with an alternative forecast where growth rates continue their linear decline since 2010:

Those lines are close together, but the difference in 2021 is about 10,000 people, which is nothing to sneeze at. That’s somewhere between 2,500 and 8,000 housing units not needed. DC already thinks there will be an excess of housing; my analysis would suggest they may even be overstating future demand.

But broadly speaking, DC’s government is probably correct on the key points: growth is going to slow down. However, two factors may help DC remain competitive. First, WMATA sucks, DC is miserable for driving, and there’s no prospect of any new infrastructure that will reduce traffic appreciably, which means commuting is going to get more and more miserable. This means that the value of living in DC is going to rise in the next few years, at least until WMATA gets its crap together. These problems could induce people to leave the DC metro area entirely, but, for DC, the within-metro gains from reduced commuting likely exceed DC’s share of whole-metro-area losses. Second, DC has been improving its quality of governance while reducing its fiscal burden, mostly by reducing tax burdens on the middle- and lower-income people who make up most residents. They’ve also cut businesses taxes and sales taxes. With better governance and lower cost of government, DC should simply be more appealing for the next year or two, before other jurisdictions have had a time to reassert any relative advantage.

Okay, so that’s all cool. Growth is likely to slow down, but it’s not clear exactly how much. Now let’s look at housing. DC thinks there will be too much housing.

I have to massage the ACS data a bit to get it to match the DC budget data in back years, so take this with a little bit of caution.

The basic story here is DC expects housing construction to slow and then, in 2021, the housing stock actually shrinks. Which is pretty remarkable given they forecast still-rising population. I don’t want to quibble with DC’s housing forecast though; let it be what it will be. I’m not actually an expert in forecasting the housing market (rimshot). Instead, let’s take their housing estimates as a given. Here’s people-per-housing-unit for their forecast, and for my lower forecast:

So DC is saying there won’t be enough people to fill the housing units, but hey look the population-per-housing-unit is actually going up under either specification.

But this excludes vacant units. Let’s build a new model where we calibrate people-per-occupied-unit. But that’s a bit tricky since we don’t have vacancy estimates for the future years. So how about we make a fun assumption: let’s assume that the people-per-occupied-unit ratio remains at its 2016 level, and force any increase in population to reduce the vacancy rate. Then, let’s graph the vacancy rate, to see how tight DC’s housing market has to get under our various forecasts.

DC’s forecast has criticially low vacancy rates. My forecast has stable vacancy rates. To be blunt, my forecast makes more sense, and DC’s forecast doesn’t really jive with the text of the budget. The implication of their numerical forecasts is that the vacancy rate is going to fall sharply to about 5%.

Either DC is being too optimistic about population, too pessimistic about housing supply, or they are forecasting a sharp increase in rents due to very low vacancies.

My suggestion is they are too optimistic about population growth, despite their view that they’r erring on the side of caution. It’s also possible that they’re too pessimistic about housing supply. However, these vacancy rates seem very unlikely to me.


And If There’s An Error…

Let’s say that Census estimates right now are wrong. Let’s say DC’s population is already 1% higher than we thought. That would mean population-per-housing-unit has risen by 7.6% since 2010, rather than 6.5% in current population estimates. And if DC’s population is 2% higher, it gets even more severe. On the other hand, if DC’s population is 2% lower than currently estimated, then population-per-housing-unit has risen just 4%. Depending on which error band you pick for forecasts, implied vacancy rates can range from 3% to 8%.

I have two very separate points in this post:

  1. Provide a case study of how large Census revisions can be, and illustrate why that matters.
  2. Offer some friendly analysis as a DC resident of my government’s budget.

I literally don’t know diddly squat about what’s in the new budget. Policies, proposals, I got no clue. But whatever those policies are, I hope they’re being made with the most prudent population forecasts in mind.

Check out my Podcast about the history of American migration.

If you like this post and want to see more research like it, I’d love for you to share it on Twitter or Facebook. Or, just as valuable for me, you can click the recommend button at the bottom of the page. Thanks!

Follow me on Twitter to keep up with what I’m writing and reading. Follow my Medium Collection at In a State of Migration if you want updates when I write new posts. And if you’re writing about migration too, feel free to submit a post to the collection!

I’m a native of Wilmore, Kentucky, a graduate of Transylvania University, and also the George Washington University’s Elliott School. My real job is as an economist at USDA’s Foreign Agricultural Service, where I analyze and forecast cotton market conditions. I’m married to a kickass Kentucky woman named Ruth.

My posts are not endorsed by and do not in any way represent the opinions of the United States government or any branch, department, agency, or division of it. My writing represents exclusively my own opinions. I did not receive any financial support or remuneration from any party for this research.