It seems jobs are always on our minds.
When the economy is doing poorly, Americans consider it the nation’s most important problem. When the economy is doing well, we worry that jobs in some areas are growing, while others are not, or that the gap between high-paying and low-paying jobs is increasing.
Here at the Urban Institute, we routinely pull together important, underused data to perform novel research at the local level. The newest examples are public tract-level files derived from the Longitudinal Employer-Household Dynamics Origin-Destination Employment Statistics (LODES) by the US Census Bureau, which provides a wealth of information on workers and jobs in all 11 million census blocks in America.
Just in the past year, Urban researchers have used LODES to analyze capital flows in Detroit (PDF) and to create scores for Opportunity Zones (PDF). To understand why they’re so valuable, and why they’re hard to use, we need to better understand the data.
What are the LODES data?
The LODES data are derived from Longitudinal Employer-Household Dynamics microdata. The microdata link employee and employer data by combining administrative state unemployment insurance wage records with other administrative and survey data. The source data are aggregated and adjusted to protect confidentiality.
The LODES data are available for most states annually from 2002 to 2015 and include two datasets:
· Residence Area Characteristics (RAC): This file lists the total number of jobs by the census block where the employee lives. This dataset contains over 80 million rows.
· Workplace Area Characteristics (WAC): This file lists the total number of jobs by the census block where the employee works. This dataset contains over 30 million rows.
To say that each of these files represents one dataset is a bit disingenuous — many different datasets describe different subsets of the data — but we’ll leave that discussion for another time. The Census Bureau also published a third valuable dataset — the Origin-Destination file — which we have not yet summarized, but hope to do in the future. This file has the total number of jobs in one census block that commute to another census block, for all block combinations in the US. The point is, there’s a lot of data.
So why are LODES data underused?
Data limitations partially explain why LODES data are underused. As of today, the data only go through 2015, so they are not as current as other estimates of the economy. LODES data also exclude a few categories of employment, such as domestic work, some agricultural jobs, railroad employment, and self-employment. Some states, like Massachusetts and Wyoming, chose not to participate in a few years of data, so coverage is incomplete.
In our experience, the primary barrier for the data’s use in research is their relative obscurity and size — the datasets are too big for most traditional statistical software systems to handle.
Oh, and they’re made available via over 75,000 zipped CSV files.
For researchers who want to use LODES, this means writing a program to automate the download, unzipping 75,000 files, and stitching them together. This can take a long time, and researchers are often working on the problem separately, reinventing the wheel for each analysis.
At Urban, we’ve experienced these problems firsthand. And with the advent of new big data technologies and cheap, scalable cloud computing, we have built a big data system to address them (we’ll talk about this system in a future post, don’t worry).
We have summarized the data at the tract level and are making them available as open data
We have used our big data system to create an accessible, census tract–level file of the most requested LODES, RAC, and WAC data from 2002 to 2015. Today, we are releasing these data on Urban’s Data Catalog in CSV format. Researchers from anywhere can now easily run analyses on the LODES data without having to go through the tedious and error-prone process of collecting and summarizing the data.
Compared with the roughly 11 million census blocks, analyzing 70,000 census tracts in the US takes much less processing time for potential users of the data. And aggregating to the tract level makes the data more useful for conducting neighborhood-level analysis because many other datasets are summarized by census tract.
You can find the files on our Data Catalog. The file is a summary of all jobs by census tract and year — in other words, the unit of observation is a census tract year. Related metadata and definitions have been kept consistent with the original census documentation for clarity and are included in the Data Catalog entry.
Making our data more widely available
We are excited to see what new questions are answered by making these data more accessible — especially seeing how spatial patterns in the labor market connect with other topics like housing, child care, or transportation.
We hope that this is the first of many open data releases to come from the Urban Institute on our Data Catalog. We want to make our data and metadata accessible so researchers can spend less time doing mundane data organization tasks and more time delivering insights. We look forward to your feedback and hope you join us here at Data@Urban for more Urban open data releases throughout 2019.