Data Digest | Summer’s end 2017
As the seasons change, try practicing your exploratory analysis skills on the featured datasets and Data Projects that kept us cool through the summer! Gather insights on the Trump administration, compare literacy rates in India’s Telangana, and collaborate with others on a Bigfoot sightings Data Project. The volleyball is in your court.
Sports
Liverpool English League Matches — Liverpool Football Club’s English League results from 1893 to 2016
Indian Premier League Matches — IPL cricket data exported to CSV files from SQL server (577 matches up to season 9)
NASCAR Champion History (1949 — Present) — 67 years of NASCAR Champion Season History via Wikipedia
Does Pro-athlete Fame Correlate to Results? — Data Project to calculate win/loss ratios for players and compare them to the various stats in the ESPN Fame 100
NFL Data —Data Project to identify trends for fantasy football purposes
Media
Podcasts Dataset — Podcast episodes published between 2007 and 2016
Michael Phelps vs. a Shark — Data Project to compare shark speed times with Phelps’s Olympic times, visualize and synthesize the data, and predict a final victor for this year’s Shark Week race
Do the Best Movies on Netflix Pass the Bechdel Test? — A movie passes the Bechdel test if it has two female characters who talk to each other about something other than men. This project evaluates the likelihood of movies passing the Bechdel test across various factors (budget, ratings, genre, etc.)
Fox News Facebook Shares vs. Likes — This study analyzes the impact a single Facebook share has on a Facebook post
Rolling Stone’s 100 Greatest Metal Albums of All Time — Rolling Stone magazine’s all-encompassing list of the greatest metal albums of all-time
International
Literacy Rates in Telangana — The number of literates among males and females and their literacy rates in each of the districts
Suicides in India — Number of suicides that happened in India by state from 2000 to 2012. Includes detail on social status, education status, and professional profile of those who died
US Immigration Enforcement — Numbers of immigrants apprehended, removed, or returned by US DHS (CBP, ICE) yearly from 1925–2015
Economy
The Essential Landscape of Enterprise A.I. Companies — Companies that also use a wide range of AI and machine learning technologies, ranging from computer vision to NLP / NLU
Stock Facts — Stock Market Facts Combined with Board Members
Fortune 500 Diversity — Every Fortune 500 company’s 2017 diversity data, or lack thereof
Occupations by State and Likelihood of Automation — 702 SOC (Standard Occupational Classification) jobs, their likelihood of automation, and the number of jobs per State
Post-school Earnings Summary — Over 7,700 rows that detail college name, race percentage, median income, etc.
US County Economic Data Compiler — Data Project to organize, reformat, and create intuitive geographic displays of U.S. county economic data
Politics
White House Salaries — CSV scraped from the 16-page PDF detailing the salaries of Trump administration employees
Party Representation — Data Project investigating “How well does our government represent the people based on their party affiliation?”
Healthcare
Medical Discharge Rates by State — Selected medical discharge rates by state from 1992 to 2015 via The Dartmouth Atlas
Fentanyl Dispensations in New Jersey — Fentanyl dispensations made by New Jersey pharmacies from 2011 through early 2017
NJ statewide overdose deaths 1999 to 2016 — Includes total deaths, heroin deaths, and fentanyl deaths
Other
Bigfoot Sightings — Full text and geocoded sighting reports from the Bigfoot Field Researchers Organization (BFRO)
ANSUR II — @datamil’s ANSUR II database contains 3D whole body, foot, and head scans of soldier participants. The data from this survey are used for a wide range of equipment design, sizing, and tariffing applications within the military
Homelessness Point-in-Time Estimates — National Point-in-Time (PIT) estimates of homelessness, national estimates of homelessness by state, and estimates of chronic homelessness from 2007–2016
NICS Firearm Background Checks — Monthly data from the FBI’s National Instant Criminal Background Check System, converted from PDF to CSV
Are dog size and intelligence linked? — Data Project getting to the bottom of a very urgent question
LARA Hotel Reviews — A LARA (latent aspect rating analysis) of Datafini’s open hotel review data
Federal Food Desert Programs —Data Project combining USAspending data with other datasets to identify communities that need support
2017 Total Solar Eclipse Map and Shapefiles — Shows the path of the Moon’s umbral shadow during the total solar eclipse on August 21, 2017
Sunsquatch Challenge — “There are no more eclipse maps to make”… the internet accepted the challenge
Future Asteroids — All known future asteroids poised to pass near Earth, some being potentially hazardous objects
Tutorials
Python Data Wrangling Tutorial — 5 useful data wrangling techniques using Python Pandas and data.world
SPARQL Tutorial — Learn SPARQL by practicing with data about twelve important people from George R. R. Martin’s Game of Thrones
Titanic Disaster Dataset — Data for exploratory analysis and building binary classification models to predict survival among Titanic passengers
What to put in data.world — A Data Project with examples of the four types of data and context you can put in data.world
Introduction to SQL functions and GROUP BY — Introduces SQL functions and then performs aggregations via the GROUP BY clause
Stay tuned for our next Data Digest compilation! If you liked this Digest summary, we encourage you subscribe to our weekly Data Digest email and share your favorite datasets with friends, family, and data enthusiasts alike.
Data work is much easier when everyone can contribute to it. Learn how to use data.world to collaborate with your professional teammates on your data projects here.