Examining 1950 Census Records Reveals Traces of the Datafied State

What the traces left behind in “antique” US census records can tell us about the life of data and its official uses.

Dan Bouk
Data & Society: Points
8 min readMay 4, 2022

--

Earlier this month, the National Archives released the long-hidden, original records from the 1950 census. It is a massive trove of data: Twenty questions asked about 150 million people makes for something like three billion responses. After this data was initially collected in 1950, it was processed and parsed to generate official statistics. But, by law, individual responses must be kept confidential for 72 years (a length of time once rumored to represent an average lifespan in 1952, the year the law was passed, but more likely resulting from “bureaucratic happenstance”). Only now has that pledge of confidentiality expired. Today, the records can be searched by name or browsed by place using a very nice interface built by the federal government. Any person — and not just the Ancestry.coms of the world — can also download in bulk the entire scanned collection of census sheets.

Ben Schmidt, a digital humanist at NYU with a knack for quickly teasing out important (and frequently beautiful) insights from large data sets, noticed something strange as he examined a wide swath of the scans. The grids of questions and answers seemed, at first, to be moving relative to the edges of the scans. Upon further investigation, he realized that as the entire grid moved, the six rows marked with bold, dark lines on each sheet always stayed in exactly the same place. Why was that?

Here’s an example of the moving grids in three adjacent census sheets from midtown Manhattan. The first line of responses rises a bit from the left-most sheet to the middle sheet and descends markedly on the right-most sheet.

Here’s another look at those three sheets, side-by-side. The bold rows, which I have highlighted in blue, remain at the same level.

Three sheets are set side by side. Two lines highlight how the bolded sample lines are all found the exact same places on the papers.

What made them so special? What was their purpose?

The answer reminds us that when we look at historical census records, we are examining the traces left behind by a complicated technological system. They are evidence — or reminders — that what we read on a handwritten census sheet is not merely a latent fact, but instead an antique variety of data exhaust. The original census records made possible the mass production of facts about the American people; they are an administrative residue that we now value immensely.

The Meaning Behind a Bolded Line

The census enumerator who encountered each bold line, called a “sample line,” was required to treat the person listed there in a special way. Every one in five people was to be asked a series of supplemental questions: about where they had lived the prior year, the extent of their education, the birthplaces of parents, and the income they or their family had received the year before. The bureau employed over 140,000 census takers and relied on these paper tools with their bold lines to guide and direct them — to make it possible to randomly select individuals for extended questioning without relying on the judgment of their temporary workers. Those bolded lines controlled workers in the field. They also integrated field observations with finely engineered tabulating machinery, and realized the plans of technocratic experts trying to squeeze extra knowledge out of the census.

There were limits to the control that paper forms could exert over enumerators and the forms themselves demonstrate this. Too much of the work required too much judgment or could hit a multitude of snags, and so the bureau built in some flexibility and interactivity. After the initial census taking, local census officials verified and checked entries or solved problems, and later clerks would “edit” the sheets to ensure they had been filled out in a way that fit the bureau’s requirements (ensuring, for instance, that each person had been labeled with one of a handful of census-approved racial categories). Enumerators highlighted problems, questions, errors, or revisions in the “notes” sections on each page. Those sections created a space for communications across different stages in the enumerative process. They also served another purpose: the questionnaire’s designers could move the notes sections around the page (sometimes placing them at the top, sometimes at bottom, sometimes a bit of both) and so made it possible for the grid of names to shift while the sample lines held steady.

After those sheets were filled in by enumerators, verified by supervisors, and corrected by editors, they would make their way to offices in Philadelphia and Washington, D.C., where clerical workers employed a specially designed “Richards copyholder” to help them translate handwritten responses to punched paper cards. The worker placed each census sheet on the copyholder, which then guided their attention from line to line. The copyholder had to be custom-made so that it could accommodate the bold sample lines. “After the 100-percent questions were punched for the sample line,” explains the procedural history for 1950, “the machine spaced itself automatically to the sample questions, then returned to the next line.” Schmidt speculates that this explains why the sample lines all appear on the same physical place on the paper: to accommodate the machine that ensured card-punching clerks punched all the answers on all the cards.

A census bureau clerk operates a machine used for punching cards that can also hold large census sheet. The caption reads “Punching the population information using the Richards copyholder.”

A Desire to Know More

The sample lines, the special machine to read them, and all of the efforts bent on getting them read stemmed from the desire of government planners and other data users to know more about the nation and its people. All the questions on each sheet had been suggested, amended, and selected by the Director of the Census with advice from bureau staff and “advisory committees of experts.” Because Congress and other data users required more information than the bureau could ask of everyone on a single sheet of paper, the census in 1950 employed (for only the second time) statistical sampling, cheating the limits of the census system. Later on, after people on sample rows had been asked more questions and their cards had been punched with the help of the Richards copyholder, probabilistic reasoning would allow bureau officials to translate this twenty percent sample into an accurate estimate of how the entire nation would have responded.

These technologies — paper forms that distribute and coordinate labor, devices that guide the reading of a form, or probabilistic sampling — are not the most famous data innovations commonly associated with the 1950 census. That honor belongs to UNIVAC (Universal Automatic Computer). The Census Bureau became the first civilian client to purchase the groundbreaking electronic digital computer, which it used to generate some of the final tables of statistics for 1950. Up until that point, electronic computing had been limited to military uses and sponsors. The machine cost the bureau $701,000 to purchase and install, a sum that grew to about $1.25 million when one includes the costs of making the office space suitable for the room-sized device by installing air conditioning and a backup power supply. The Census Bureau had earlier played a leading role as the patron for paper card tabulation in the 1880s, helping to bring about a new era in data-processing. In 1950, some thought the UNIVAC would eventually bring its own revolution: “Information which requires several runs through punched card equipment can be obtained from the Univac in one run,” explained the procedural history. It continued, looking forward: “With machines like the Univac, future censuses should be processed with considerably greater speed.”

A census employee sits behind a large device with many dials. Another employee stands beside and adjusts a series of magnetic tape reels. The computer takes up the entire room.

The allure of charismatic computing technologies can distract us from the rest of the data-gathering system. These census records, by contrast, draw us back to the bigger picture. They remind us that the data we rely on to govern, advocate, or plan (whether we’re in government, business, academia, or elsewhere) result from a massive cooperative effort.

Lessons from Antique Data

If we want facts to guide us, we have to support that system and work to make it as strong, accurate, and inclusive as possible. That is not just the work of experts, but work that each of us who depends on facts can contribute to: advocating for adequate funding for censuses that will seek the people and communities who are hardest to count, working together to create better census questions and categories. (It is well past time, for instance, to create more inclusive sex and gender questions.)

Census records, as antique examples of data exhaust, can inoculate us also against the claims of the surveillance capitalism, which suggests that new technologies have created unprecedented new opportunities to watch, track, and document lives. To some degree that is true. But looking at past census sheets, I see how — even long ago — the value of data exhaust was realized, and then regulated carefully by experts subject to democratic oversight in a reasonably transparent process that took significant steps to protect individual privacy. Why not hold today’s data behemoths to the same standards?

When I asked Schmidt about why these records matter and why he was drawn to the problem of those shifting grids in the first place, he emphasized the inherent interest of “immersive detailed records about individuals” from the past. Schmidt recently published a “dot density map” of the United States using published 2010 and 2020 census results, representing the entire US population as 300-million dots, each labeled with race and ethnicity categories, and paired with congressional district lines or maps showing evidence of historical racial redlining. Imagine the visualization he could build from the 1950 records if more than page scans were available. In that, the problem is not technological, but social, says Schmidt: “Partnerships with private genealogy companies have managed to keep historical census data from being freely available and usable in different ways.” He discovered the shifting grids while trying to think about how to make this national resource more readily available to everyone.

Dan Bouk is Associate Professor of History at Colgate University and was a Faculty Fellow at Data & Society. His book, Democracy’s Data: The Hidden Stories in the US Census and How to Read Them is out in August (and available to pre-order).

--

--