The true population migration rate remains shrouded in statistical fog.

How to Read the Census Population Estimates Numbers

Revision Histories Tell Us a Lot About the Usefulness of Data

Lyman Stone
In a State of Migration
9 min readJan 5, 2016

--

The flagship migration number that most commentators cite when they’re talking about net migration in a region is the U.S. Census Bureau’s Population Estimates Program (PEP) data. This data is based on a holistic accounting of all the factors of demographic change: births, deaths, domestic migration, and international migration. The PEP migration estimates produced tend to track IRS, CPS, and ACS migration estimates fairly closely, but not exactly so. That being said, I’ve been asked why I prefer sources like the ACS to the apparently more rigorous PEP. The reasons are numerous and include: ACS/CPS/IRS provide more detail about who is moving, have a more direct relationship to underlying data, and, what I’ll focus on here, those sources more clearly state their own limits.

In other words, I like ACS, for example, because it gives you a margin of error that is, ultimately, mathematically derived from relevant factors like sample size. PEP, meanwhile, gives you an exact number, no margin of error. I worry that PEP sacrifices accuracy in order to achieve precision. And furthermore, the manner in which PEP does this leads some commentators to basically make some unjustifiably strong claims.

So for this post, I’m going to walk through the PEP revision history and point out how significant, and subtle, back-year revisions can be. My point here is not that the PEP is a bad data set (it’s not; I use it all the time to double-check other sources), or that Census isn’t doing their job (they do their job quite effectively), or even that they should altar what they’re doing. Rather my point is simply that people don’t really know how to read and understand PEP data.

What is Population Estimates Data?

A Rolling Forecast of Hypothetical Census Data

If the Census were taken July 1, 2015 what results would it find for the total population of Alabama? That’s what PEP is trying to guess. They provide a few subcategories, but it’s really about estimating total population. That’s why ACS uses the most PEP total population data as a key underlying baseline number: because where PEP is strongest is in telling you how many people live somewhere.

But that strength derives from the “components of change.” These are births, deaths, and net migration. PEP derives this data various ways: births and deaths basically come from official records, net migration from a mixture of survey and administrative records. One place where PEP almost certainly messes up is on estimating net international migration, as they probably low-ball native-born emigration, but that’s not crucial for my purposes here.

The point is that these data aren’t all available at the same time, and not all at once. Meanwhile, PEP data are released on a really aggressive schedule: the 2015 data for state-level units was released in December. That’s a really quick turnaround time. And that’s the point: PEP gives you this standardized baseline, and gives it to you quickly, and in an accountable way, and in a fairly absolute way. Net migration was more-or-less exactly X. Yeah, yeah, there’s a residual, and the residual for domestic migration is larger than many of the other categories, but it’s still a miniscule residual. For all intents and purposes, PEP gives you a fixed number.

And because it’s a standardized time series, it’s the only place you can go where the government produces real estimates of year-over-year change in migration. So that’s really cool.

There’s just one teentsy-weentsy problem. And that problem is that, in order to come up with that exact number and get the most accurate possible forecast, PEP carries out revisions. These revisions can be pretty darn big.

Just How Big ARE Revisions?

Big Enough to Matter

I put together some data that doesn’t easily turn into a pretty chart, so sadly no visual this time, but I’ll describe it here. I compare annual migration rates for 2011–2015 use PEP vintages 2012–2015. As an example, I compare the estimates made of 2013 migration for a given state in 2013, 2014, and 2015. For 2011, I compare the estimates made in 2012, 2013, 2014, and 2015. I selected those years for convenience, it makes no big difference what years you select, though adding more years will lead to bigger maximum revisions.

Some states have bigger revisions than others: Alaska, North Dakota, Rhode Island, Mississippi, DC, Massachusetts, New Hampshire, Idaho, West Virginia, South Dakota, and Hawaii are some of the worst offenders in terms of average revision size, while California, New Jersey, Michigan, Louisiana, Indiana, New Mexico, Missouri, Georgia, Ohio, Tennessee, Florida, and Minnesota have smaller average revision sizes. Now we might assume that the most recent data is the most reliable: but there’s no shortage of cases where revisions first went one way, then went another, and may switch again next year. For a sense of scale, the average estimated absolute value net migration rate for all years, states, and vintages was 0.32%. The average maximum revision gap for each year-state was 0.02%.

In other words, the average state’s migration rate estimated in a given year from the PEP series can be reasonably expected to shift higher or lower by about 0.02% in subsequent revisions. While small, that error is lumped unevenly across states. A state with high net migration may shift by 0.01% to 0.05% without noticing. A state that’s right along the break-even point, however, may care quite a bit about the difference between -0.02% and 0.02% net migration.

But that’s not all. A key component of the PEP method is that it relies on birth-death records for a kind of core population growth. These birth and death records are supposed to be really solid, basically they’re supposed to be as reliable as the Census itself. But of course, these birth and death records are revised too, and at almost the same rate as net migration. Which, sidenote, is why “residual methods” of calculating migration are still imprecise: because the core, ultra-reliable sources on births and deaths still have error.

But Lyman, Those Revisions Still Sound Small To Me

Especially When you Compare to ACS or CPS Error Bands!

The revision history shows that even the fairly absolute PEP numbers are unstable. After each new Census, the previous decade is shielded from serious revision, and that’s pretty much the only reason we stop revising. But still, the revision sizes I pointed out don’t seem big compared to error bands in other sources, right?

Wrong. The revision sizes are not statistical error, they are structural error. Structural errors due to historical revisions are in addition to any hidden underlying statistical error, not instead of. The PEP series doesn’t offer you error bands on births and deaths, but clearly such error ranges do exist: no data source is perfect. But because the estimates come largely from administrative data, not just surveys, there’s no way to derive a systematic margin of error for the general population, and any errors will almost certainly relate to systematic bias caused by administrative priorities and coverage.

In other words, there’s no way we can know the size of underlying statistical error for the PEP series, even though we know it exists, and the PEP series’ historical revisions create a second source of error for commentators who want to identify “this year’s migration rate.” To be clear, these multiple errors don’t mean PEP is necessarily less accurate than ACS or IRS SOI. It might still be quite accurate and represent the true population very well. But these multiple sources of error mean that commentators must add in their own speculation about (1) the size and direction of underlying statistical error and (2) the size and direction of future statistical revisions.

If revisions make the data more accurate then, paradoxically, the PEP series may lead to commentators making systematically error-prone in-year statements, while also constantly improving the quality of the time series. I know this is confusing: bear with me!

So What’s the Point of All This?

Track Changes, Discuss Revisions, Speak Moderately

Given these two sources of error that could lead a given year’s PEP estimate of migration to vary from the true migration rate and/or future PEP estimates of the true migration rate, commentators should use some very specific procedures when writing about PEP for specific areas. Here are some tips:

Moderation. You know these numbers are going to be revised, maybe significantly. You know that really sharp changes can be the product of statistical flukes, and can be adjusted later. So be wary of extremes, and avoid using PEP as your one source. When reporting on PEP, try and site other sources to see how well they corroborate. PEP is a pretty good source, but being the official baseline doesn’t make it inherently the best estimate of true population trends for a specific area.

Compare to other sources and, where trends deviate, that is probably a story. When PEP diverges from IRS, maybe you’ve got a case of differential migration between non-filing people and filing people. When PEP diverges from ACS, maybe you’ve got a case of different survey and record sources with different coverage telling you something about changing populations itself. If a big trend isn’t confirmed in other sources, be wary of leaning on it.

Track changes in the data from year to year. If PEP boosted your migration rate this year versus this year’s vintage estimate for last year, but this new vintage reduced the estimate for last year versus last year’s vintage, then, well, did you really see migration increase at all? Maybe not. Before you jump to proclaim the trend, see if revisions themselves have a trend. Make sure the “rise” you proclaim isn’t actually a decrease from last year’s unrevised estimate. This happens surprisingly frequently.

Discuss revisions and reasons for them. These revisions don’t have to be one of the dark mysteries of the universe. Contact your state Census partner organization and ask them about the revisions. Contact Census directly. Compare to local-specific records that maybe Census didn’t consider, like utility hookups or public school enrollments or property records, etc. Build a paper trail on why your area’s records changed. And if, in so doing, you find that Census’ revisions don’t make sense, then contact Census. The great thing about this system is they revise every year! If you think your state or county has been revised in an irrational way, then make your case! This kind of feedback ultimately makes the data more accurate, and helps Census do its job well.

Conclusion

Improving Migration Commentary, One Source at a Time

When I started this blog, I said that I wanted to improve the state of migration commentary. Sometimes I think I’m succeeding. Sometimes I think I’m not. But my hope is that this piece will be useful for journalists covering PEP data, especially journalists covering local area changes. If coverage of PEP were more readily cross-referenced with other sources, if local commentary explored revisions, if national commentary treated PEP with a bit more provisionality, then I think the state of the discipline would be greatly improved. Here’s hoping!

See my recent posts on urbanization in response to recent coverage of PEP data, explaining why we should let the cities die, how urbanization relates to regional trends in migration, and arguing that there is no real urban resurgence.

If you like this post and want to see more research like it, I’d love for you to share it on Twitter or Facebook. Or, just as valuable for me, you can click the recommend button at the bottom of the page. Thanks!

Follow me on Twitter to keep up with what I’m writing and reading. Follow my Medium Collection at In a State of Migration if you want updates when I write new posts. And if you’re writing about migration too, feel free to submit a post to the collection!

I’m a graduate of the George Washington University’s Elliott School with an MA in International Trade and Investment Policy, and an economist at USDA’s Foreign Agricultural Service. I like to learn about migration, the cotton industry, airplanes, trade policy, space, Africa, and faith. I’m married to a kickass Kentucky woman named Ruth.

My posts are not endorsed by and do not in any way represent the opinions of the United States government or any branch, department, agency, or division of it. My writing represents exclusively my own opinions. I did not receive any financial support or remuneration from any party for this research. More’s the pity.

Cover photo source.

--

--

Lyman Stone
In a State of Migration

Global cotton economist. Migration blogger. Proud Kentuckian. Advisor at Demographic Intelligence. Senior Contributor at The Federalist.