“Illegal Immigration” Is Not Zero or Negative

Terms Matter, Residuals-Estimates Are Tricky, and Methodology Papers Are Worth Understanding

I have gone around with Noah Smith a number of times about the question of illegal immigration and what the trends in it are. But what this really boils down to is a problem with wonkery. That is, wonks like to think the entire world is their expertise and you can manipulate data however you want, and it’ll still be meaningful. Turns out, that’s not the case. Noah ran a bad piece at Bloomberg View a while back and doesn’t want to take it down or revise it despite some egregious methodological errors. I don’t like to be a dick about this stuff. I’ve now argued with him about it 3 separate times. But because I have made attempts to resolve this less directly, and because I know Noah can take it, I’m really not going to pull any punches here. Noah keeps promoting a piece that is false. Let me count the reasons why.

In the above tweet, Noah links to a post. Presumably, he believes that the post indicates the claim, namely, “Illegal immigration has been zero or negative for years.”

What evidence exists in the post for this claim?

Hooooold up. Look at what this is actually measuring. Annual change in the unauthorized immigrant population. That change could come from several places: inflows (illegal immigration), outflows (emigration, deportation), and mortality.

The actual empirical evidence in Noah’s own article does not support the claims he makes to sell the product, viz, that “illegal immigration is zero or negative.”

His actual evidence is that the illegal immigrant population has fallen since its peak. I 100% agree there. He’s totally correct. The stock of unauthorized residents in the US is almost certainly well below historic highs. But that was not the claim he made.

He claims that was his claim:

Stock is what matters.

Let’s review.

“Illegal immigration has been zero or negative” but “Stock is what matters.” How on earth can you get negative stocks of immigrants?

Noah is clearly referring to the net change, that is, the flow. He is also clearly referring to the migratory component of that flow: illegal immigration. His article does not address the question of mortality anywhere, and his graph does not claim to be net migration… yet he interprets it as if it’s net migration, when it clearly isn’t.

Noah seems confused. He says his own data doesn’t matter because it’s flows, and stocks are what matter; but his tweet is about flows, not stocks; and his tweet refers to data that he does not present, namely, net migration of the unauthorized population. He presents change in population, not net migration. Mortality is almost certainly of similar scale as outflows in at least some years, so this is an egregious omission.

More than that, Noah misrepresented a very important piece of data to his ~67k twitter follows. We have some reasonable approximations of actual gross illegal immigration, based on border apprehensions. And although it is near record lows, it isn’t zero. And since I wrote that post, May came in a bit above April, so the trough may not last if U.S. economic strength continues.

The point is, everyone who works in this field, all the actual experts, including the folks at Pew whom Noah cites, use “illegal immigration” to refer to inflows which do not have legal authorization. That’s what the term means. It’s not just me. Here’s dictionary.com:

It means inflows. Exclusively.

Now, if a person explicitly says “net illegal immigration,” then, colloquially, we know what they mean. They mean illegal immigration (which, on its own without modifiers, always means gross) minus some measure of outflows of unauthorized residents.

Problem: Noah didn’t say net illegal immigration. Review the tweet; net ain’t there. In his post, he does specify net illegal immigration 7 paragraphs and 3 graphs into it:

But hold up, what data is he referring to?

Well, he’s referring to his net change figure. Which is not actually a measure of net illegal immigration, because it doesn’t include mortality. Noah is explicitly talking about net migration of unauthorized residents, but referring to data that also includes the rather large looming component of mortality, which of necessity must suppress estimates.

It’s important to use terminology correctly in heated political debates, because these words mean different things: the consequences of net migration of unauthorized people are different under different gross rates.

That is, if we get 10,000 net illegal migration, but that’s because just 10,000 people cross the border and none go the other way, then, whatever. But if it’s net 10,000 because 1 million cross but we deport 990,000, that has veeeeeery different ramifications for public expenditures, the labor market, and the families of those 990,000 deportees. Gross rates matter independently of net rates because even temporary migration has effects on the United States.

By sloppily using the wrong data to describe net migration of unauthorized residents and then sloppily referring to net migration of unauthorized residents as “illegal immigration,” Noah has done a one-two-punch of unintentional misinformation for his readers.

Plus, He Reads the Data Wrong

So far, I’ve shown that Noah’s argument is wrong on a semantic basis. That is, he misinforms by misidentifying and mislabelling. That alone is a serious problem. Public intellectuals have a responsibility to get it right the first time, especially those who obviously have the expertise and intellectual chops to know how to do do so.

But there’s another problem: the data just doesn’t say what Noah things it says.

Pew gets their estimate by starting from American Community Survey 1-year estimates of the foreign-born population, then subtracting naturalized citizens. Then they use non-ACS data to estimate how many non-citizens are lawful permanent residents (LPRs) or legal temporary residents (LTRs). The residual must be unauthorized residents.

This is the best method we have available and Pew does very good work. I have no criticism of Pew’s estimates insofar as they go. However, we need to be aware that Pew’s estimates have limitations, limitations which they or their sources usually acknowledge when asked. Noah did not acknowledge these limitations, and indeed totally ignored them and misused the data.

Let us begin. How many illegal immigrants are there in the US?

But hold up. The S in ACS stands for Survey. Surveys have margins of error.

Problem #1: Noah Disregards Margins of Error

If you take the maximum possible margins of error and get the most extreme YoY changes from them, you can get very different YoY changes.

But that’s not all. Pew rounds to the nearest 100,000. So we need to have a margin-of-rounding (+/- 50,000), and then add the margin of error to each of those, and then get the most extreme changes!

Problem #2: Noah Disregards Rounding Errors

These rounding errors are miniscule as a share of total population. They’re huge as a share of population change. Across many years rounding errors should have a net-zero error, but we’re only looking at a few years here, and it’s very easy to have 2 or 3 years in a row with same-direction rounding errors.

So here’s the changes in the illegally residing population, and the actual margin of error around YoY changes:

As you can see, it’s a big range of error. Now, Noah is taking the central, best estimate. To reliably get the maximum, you’d need to have a reason to believe there is a persistent error that causes Pew to systematically and increasingly understate the population of illegal immigrants as time goes on.

But hold up! Noah’s claim was about the specific timing of net migration. Refer to the “negative since you were 35” comment. He’s arguing that migration was very high in 05/06 and 06/07, then went negative in 08 and has stayed zero/negative since.

Maybe. But The ACS sampling frame covers migration over a 24-month window and assigns it all to a specific 12-month period for migration data, and for year-of-arrival data it offers an inconsistent estimator over the course of the year (i.e. asking someone if they arrived “this year” in January captures a different likely share of the population as in December).

Problem #3: Noah’s Ignores Known Bias in ACS Estimates of the Foreign-Born

The result is a pretty well-known timing bias, whereby ACS doesn’t give very good estimates of the exact timing of immigration. The result is that you should allow yourself a year or so of mental wiggle room on when immigration occurred when looking at ACS data about year of entry or migration… which is the data Pew is using. Whoops. In other words, the tools Pew uses to assign legal permanent residency stats to ACS-measured foreign born could mis-assign a meaningful number of them just based on problems in how ACS structures the question. This matters less for people who migrated 20 years ago, more for people who migrated recently.

Pew is doubtless aware of this bias, but there’s really no resolution to it. It just exists. We can’t control for it in margins of error, it’s just an extra error out there that will relate systematically to the rate of inflow. Sucks to have that kind of error, but them’s the brakes.

Now, again, we can say with substantial confidence that the illegal immigrant population was declined since 2007. We just don’t know when, and we don’t know how much was from outflows versus mortality.

But… what if we did know?

What if we make the assumption that mortality for illegal immigrants is 25% of the rate we observe in the US generally? Illegal immigrants are younger and healthier, so this should give us a reasonable approximation. In reality I feel confident that this is a low estimate, but, whatever. But notably, Pew, the source Noah trusts, has shown that the illegal immigrant population is aging. So let’s assume mortality rises, say, 2% per year as a ratio of US mortality. Let’s also test an alternate assumption where illegal immigrants have 50% of US mortality, and rise 1% per year, just for funzies.

Mortality, like adjusting for ACS population estimation errors, has only a small impact. But again, it moves us up. And, of course, uncertainty over the mortality rate for illegal immigrants widened our error band, so we should have even less confidence about what’s happening.

Even Bigger Questions Remain

Noah identifies a net decline of 1.1 million from the peak in 2007. A large decline seems like the strongest prior to hold. But the margin of error even on that decline across two points is unclear. Accounting for all of the errors we’ve identified, the decline could be anywhere from 750,000 to 1.4 million. If it’s 750,000, then mortality may have accounted for between 35 and 60% of that decline. If it’s 1.4 million, then somewhere between 20 and 35%. In other words, mortality is a very large part of, possibly even the largest part of, the declining illegal immigrant population.

So it’s a non-negligible error for Noah to toss out unadjusted YoY changes without any error bands and call them illegal immigration.

But… there are even bigger sources of possible error.

When Pew cuts the data, they assume that laws are followed correctly and answers are honest. That is, they assume that if someone receives welfare that illegal immigrants can’t legally receive, that the person is a legal immigrant. But alas, no system is perfect! Furthermore, fraud rates are probably cyclical: it is likely that during recessions, when bureaucrats are processing more applications and more financially pressured people like immigrants need help, rates of improper payments rise. It is plausible to think that some of the decline during the recession (and the bump afterwards) was driven by illegal benefits receipt.

This is a source of error Pew acknowledges, and which they can’t control for. Again, it just exists, but it is yet another reason to be skeptical of the timing of these changes, perhaps some of the total estimates as well.

Plus, Pew makes variable adjustments to account for undercounts of the Hispanic population. This is a responsible thing to do! But without seeing those, it’s hard to know if they may themselves be cyclical and relate to these annual net changes.

Then there’s the calculation of the LPR population and TPR population. These estimates are fraught with peril: kids living in a household with a citizen are identified as citizens, for example. That’s probably usually true. But not always. Or consider that DHS just assumes a fixed 1% outmigration rate for LPRs each year. That means whenever LPR outflows actually vary, the whole cyclicality will get dumped into the illegal immigrant population.

Pew is doing the best they can, but the method used has too many uncertain moving parts to make it useful for estimating net migration of unauthorized residents.


What frustrates me is that Noah’s basic point, that illegal immigration is a vastly smaller problem now than 10 or 15 years ago, is totally correct. There’s tons of data to support it. He just din’t marshal any of the actual evidence to his defense. He could have just shown the trend in border apprehensions, or shown the illegal immigrant share of the population, or other kinds of data. If he really wanted to be clever, he could have just lined up border apprehensions with deportations by fiscal year to see what direct migration trends might look like.

Let me be clear. I think Noah is correct that net migration of illegal immigrants has been negative in some periods since 2007. And I am very confident that he is correct that the illegal immigrant population is falling. But we do not have sufficient data to make the claims he is making with the precision he is making them. If this were some arcane topic that wasn’t politically significant today (like, say, Spanish population trends in the 1830s) I’d say, whatever, it’s a nifty indicator, not super precise but no harm done. But Noah is widely read and this issue is hotly debated; getting the facts right is important.

