The 50 Best Public Policy Datasets for Practicing Data Science
How do you practice data science? Data analysis? Data wrangling? Anyone can buy a book, consult documentation, copy code samples, or memorize commands. Finding a way to deliberately practice the fancy new techniques you read about is a more challenging proposition.
For students of public policy, or policy professionals, the problem is even trickier. At the moment, policy education has problems familiarizing students with what data can do in the policy space. I think it is unfair to expect policy people to practice on dataset after dataset of labeled images of cats or real estate listings.
So whether you are a public policy professional or simply want to work with more meaningful data, please enjoy the X best public policy data sets for practicing data science.
Foreign Policy and National Security
Exploratory Data Analysis
German Federal Elections- 2017
Russian Presidential Election Data- 2018
ACLED African Conflicts, 1997–2017
U.S. Hourly Precipitation Data — Great Climate Change Data
Worldbank Databank — Dozens of global development and demographic indicators
Machine Learning
Global Database of Events, Language and Tone (GDELT)
Worldbank Databank — Dozens of global development and demographic indicators
NLP
Transportation Policy
Exploratory Data Analysis
GraphHopper Open Traffic Collection
Bureau of Transportation Statistics — Dozens of datasets here
Machine Learning
Uber Data for Many Global Cities
New York City Taxi and Limousine Commission (TLC) Trip Record Data
Open Flights- Airport, airline and route data
Health and Economic Policy
Exploratory Data Analysis
US Public Assistance for Women and Children
40 Years of Federal Payroll Records
UN Data — there is an entire universe worth of data here. I won’t attempt to disambiguate it. But seriously, go check it out.
National Center for Education Statistics Data Lab
CDC National Center for Health Statistics
The Atlas of Economic Complexity
DB.Nomics — The world’s economic database
Machine Learning
US Public Assistance for Women and Children
National Center for Education Statistics Data Lab
CDC National Center for Health Statistics
Quandl — Financial, economic, and alternative datasets
USA Spending — USAspending.gov database, which includes data on all spending by the federal government, including contracts, grants, loans, employee salaries, and more.
Immigration Policy
Exploratory Data Analysis
Yearbook of Immigration Statistics 2016
Mexican Migration Project — data are available for 161 communities in 24 states in Mexico
Dataset of Global Immigration Policies
Border Patrol Undocumented Alien Apprehensions 1960–2017
US Immigration Enforcement 1925–2015
Machine Learning
US Domestic Politics
Exploratory Data Analysis
Police Officer Deaths in the United States
Trump Score- How often do congresspeople vote with or against Trump?
State Election Results 1971–2012
World Motor Vehicle Production, Selected Countries
Machine Learning
State Election Results 1971–2012
NLP
Tech Policy
Machine Learning
Open Observatory of Network Interference — A free software, global observation network for detecting censorship, surveillance and traffic manipulation on the internet.