Illustration by Jemastock/Shutterstock
Oct 2, 2018 · 4 min read

High-quality, accessible education data are necessary for evidence-based decisionmaking. Analysts have begun to harness the vast array of education-related data that are publicly available, but the difficulty in accessing these data and converting them into actionable information has slowed progress.

Consider an analyst who reads about disparities in graduation rates by race and ethnicity at her local university and wants to examine how this pattern has evolved and how it compares with peer institutions. The federal government has published this information since 1996, but compiling year-by-year information requires the analyst to download at least 19 separate files or laboriously select the desired information from a data tool just to build a simple table of information for a single college. If she wants to examine K–12 data from before 2007, she’s in worse luck: data for these years come in a fixed-width format, three files per year, and without user-friendly data dictionaries. In our experience, assembling the data to answer this simple question often takes hours, if not days, to complete.

To solve this problem, we built the Urban Institute’s new Education Data Portal, which makes K–12 and higher education data easily accessible to the public. We are bringing all major national datasets on schools, districts, and colleges under one roof and standardizing the information and data documentation so that it is easy to access data, measure change over time, and make connections across datasets. The portal is currently in beta mode, and we welcome user feedback and suggestions.

Why do we need an education data portal?

The federal government releases new education data every year, which are usually in a similar, but not identical, format to the data from the prior year. Over decades, these differences accumulated as new information was added (e.g., the latitude and longitude of schools), classification schemes were updated (e.g., to include more racial and ethnic categories), and data formats were modernized (e.g., making data files easier to import into statistical software by posting them in multiple formats).

The data have improved, but the cost of this progress is that analysts need to invest significant time and effort to track changes, harmonize variables so the data are comparable over time, and present unified data documentation. This is important for describing overall trends and for measuring the effects of policy changes.

How we are tackling this problem

We began with datasets describing the universe of educational institutions annually from the 1980s to the present. The two primary datasets we prioritized in this initial release are the Common Core of Data for K–12 schools and districts and the Integrated Postsecondary Education Data System (IPEDS) for colleges. We will continue to add data to the portal, eventually including information that looks beyond the classroom to capture the full picture of students’ lives, such as neighborhood-level income and health outcomes.

At the heart of these efforts is an application programming interface (API) that provides direct access to this rich repository of information. Programmers and web developers can access data directly through the API or create tools that rely on the data. Our colleagues on Urban’s technology and data science team have gotten the ball rolling with education data packages in Stata and R. These packages automatically read user-selected subsets of data and label all variables and values so that users do not have to extensively refer to separate documentation sources. In future Data@Urban posts, we will walk through the process of building the API, the documentation site, and the statistical programming packages.

Now, our hypothetical analyst who wants to track graduation rates by race or ethnicity and year can obtain that information with a single line of Stata or R code in minutes. She can also easily get, for example, fourth-grade enrollment data dating back to 1987. Depending on the project, having the data provided clean and ready to use can save the analyst hours or days of data work — time that can be spent gleaning insight from the data rather than wrangling it.

What’s next?

We are excited to see early uses of the data portal. Shortly after we launched the beta version, an analyst used the educationdata R package to explore the relationship between college enrollment and the number of applications received.

Our efforts so far put a lot of data at the fingertips of technically savvy users, but we know that not everyone who can benefit from using education data knows how to write code. We don’t want the programming interface to be a barrier, so we are also building a user-friendly, point-and-click interface so that users without a programming background can easily access the data they need. We envision that with a few clicks, a local journalist will be able to access a list of area elementary schools along with selected pieces of information from different datasets pertaining to those selected schools.

Data in education will make the greatest impact if everybody can access it. Urban’s Education Data Portal aspires to make that a reality.

-Matthew Chingos

-Erica Blom

Want to learn more? Sign-up for the Data@Urban newsletter.


Written by

Data@Urban is a place to explore the code, data, products, and processes that bring Urban Institute research to life.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade