Datasets Digest | Spring 2017
Spring was an exciting time here at data.world, with featured datasets ranging from SXSW Twitter Traffic and March Madness Predictions to World Happiness Reports for the United Nations. We hope you’ll get a chance to dive in and make your own conclusions on the data that makes us tick.
Sports
NCAA Men’s March Madness — Men’s March Madness historical results, 1985–2015
2017 March Madness Predictions — Forecast data for the 2017 Men and Women March Madness tournament, and team rankings
NCAA Tournament Results— Every NCAA tournament game result since 1985 (when the tournament was expanded to the 64 team bracket)
Major Sports Venues Usage — Represents teams or events that are associated with 12 major sports leagues
Toughest Sport by Skill — 60 sports ranked across 10 skill categories by an ESPN panel to determine the most difficult sport
NBA Salaries— Salaries of NBA players from 1990 to 2016
Science & The Environment
HD6D LIDAR High Speed Descent — NASA’s HD6D LIDAR for High Speed Descent Mapping Project
Global Footprint Network National Footprint Accounts — National Footprint Accounts (NFAs) measure the ecological resource use and resource capacity of nations from 1961 to 2013
Marches for Science, Domestic Crowd Sizes — Estimated crowd sizes for marches in approximately 200 cities in the U.S.
US National Parks Visitation 1904–2016— All United States National Parks from 1904–2016 with geographical boundary lines and visitation numbers by year
Chlamydia by State — This dataset shows rates of Chlamydia by state, 2000–2015
Society
Open Sourcing Mental Illness — Data on prevalence and attitudes towards mental health among tech workers
Social influence on shopping — Survey of 2,676 millennials: What social platform has influenced your online shopping the most?
Teen fake news poll on After School — BuzzFeed partnered with After School to ask 39,000 high school students about their opinions on fake news. Here’s what they said.
Cat vs. dog popularity in the U.S. — Population and ownership by household of dogs and cats broken down by state via the American Veterinary Medical Association
2017 SXSW Twitter Traffic — A collection of all tweets that mention #sxsw or SXSW
Missing children in the US — The National Center for Missing and Exploited Children (NCMEC) list of missing children across the US
J.K. Rowling tweets and retweets — 10,159,892 identifiers for tweets and retweets sent by or to J. K. Rowling, @jk_rowling
Every Donald Trump tweet — Whether you’re politically on the right or on the left, dig into the data for this challenge and tell us what you think!
Stand-up on Comedy Central — All episodes in 15 seasons of Comedy Central Presents, a standup comedy series that featured 260 comedians
Barks for Beers — Chronicling visits to 30 Austin breweries for Divine Canines’ “Barks for Beers” fundraiser
Mock presidential election poll for teens — A mock 2016 presidential election poll taken mid-October by over 100,000 teens in the United States on the After School App
Washington Post police shootings — The Washington Post is compiling a database of every fatal shooting in the United States by a police officer
Houston email metadata — Email address metadata from the City of Houston obtained by FOIA by @chaps on the Sketch City Slack channel
Data for Democracy
Datasets built out by the Data for Democracy community, a diverse group on a mission to democratize data.
Election Transparency — This project analyzes elections in an effort to identify trends, outliers, and/or anomalies to enable insight and transparency into the democratic voting process
Drug spending—This group is finding ways to make Medicare drug spending data more consumable
Internal displacement — This project aims to classify, tag, analyze and visualize news articles about internal displacement, and is based on a challenge from the IDMC
Propublica — campaign spending — Analyzing campaign spending data to support the non-profit investigative journalism publication, ProPublica
Propublica — foreign travel — A web scraping/data engineering project around foreign travel expenditures
Propublica — house expenditures — A dataset on House Office expenditures
Economy
United Airlines Data — The data has been selected and analyzed to present a view of the industry and its important trends, as well as to identify fundamental drivers of success — and in some cases, the early signs of potential failure
Beer data — US brewery production of beers & cans, kegs & barrels, and taxes determined
Growth rates of industries through history — Comparison of growth rates of industries, startups and public stocks during times of industry disruption
Special economic zones by country — Creating the world’s first database of all special economic zones: their location, value, and size
International
Population, growth rates and population density São Paulo — Population parameters including total amount, density, and growth rates, broken out by district in the city of São Paulo, Brazil
The CNS North Korea Missile Test Database — North Korean missile tests since 1984
Indian retail prices — Retail prices of key commodities in India from 1997 to 2015
World Happiness Report — The first World Happiness Report was published in April 2012, in support of the UN High Level Meeting on happiness and well-being
Mines in Africa — Number of mineral mines (total and by commodity) for 5,835 African ADM2 units
Stay tuned for our next Datasets Digest compilation! If you liked this Digest summary, we encourage you subscribe to our weekly Datasets Digest email and share your favorite datasets with friends, family, and data enthusiasts alike.
Data work is much easier when everyone can contribute to it. Learn how to use data.world to collaborate with your professional teammates on your data projects here.