The 50 Best Public Policy Datasets for Practicing Data Science

Ryan Williams
MetaPolicy
Published in
4 min readJul 23, 2018

--

How do you practice data science? Data analysis? Data wrangling? Anyone can buy a book, consult documentation, copy code samples, or memorize commands. Finding a way to deliberately practice the fancy new techniques you read about is a more challenging proposition.

For students of public policy, or policy professionals, the problem is even trickier. At the moment, policy education has problems familiarizing students with what data can do in the policy space. I think it is unfair to expect policy people to practice on dataset after dataset of labeled images of cats or real estate listings.

So whether you are a public policy professional or simply want to work with more meaningful data, please enjoy the X best public policy data sets for practicing data science.

Foreign Policy and National Security

Exploratory Data Analysis

German Federal Elections- 2017

Russian Presidential Election Data- 2018

ACLED African Conflicts, 1997–2017

Global Terrorism Database

U.S. Hourly Precipitation Data — Great Climate Change Data

Worldbank Databank — Dozens of global development and demographic indicators

Machine Learning

Russian Troll Tweets

Global Terrorism Database

Global Database of Events, Language and Tone (GDELT)

World Climate Data

NOAA Climate Data

Worldbank Databank — Dozens of global development and demographic indicators

NLP

Russian Troll Tweets

Global Database of Events, Language and Tone (GDELT)

Health and Economic Policy

Exploratory Data Analysis

US Public Assistance for Women and Children

School Shootings 1990-Present

40 Years of Federal Payroll Records

UN Data — there is an entire universe worth of data here. I won’t attempt to disambiguate it. But seriously, go check it out.

National Center for Education Statistics Data Lab

CDC National Center for Health Statistics

FDA Datasets

The Atlas of Economic Complexity

DB.Nomics — The world’s economic database

Machine Learning

US Public Assistance for Women and Children

National Center for Education Statistics Data Lab

CDC National Center for Health Statistics

FDA Datasets

Quandl — Financial, economic, and alternative datasets

USA Spending — USAspending.gov database, which includes data on all spending by the federal government, including contracts, grants, loans, employee salaries, and more.

Tech Policy

Machine Learning

Bulk US Patent Office Data

Open Observatory of Network Interference — A free software, global observation network for detecting censorship, surveillance and traffic manipulation on the internet.

NLP

FCC Net Neutrality Comments

Bulk US Patent Office Data

--

--

Ryan Williams
MetaPolicy

Antidisciplinarian. Studies Global Policy at the LBJ School of Public Affairs.