5 Data Science Projects Waiting For You

Start a project today!

Zoshua Colah
Data Science Library

--

Dubstech, the largest tech community at the University of Washington, hosted UW’s first Datathon, a data science hackathon for both beginner and advanced data science students, not too long ago. My team and I were responsible for writing the prompts. We decided to start searching the web (mostly Kaggle) for interesting data science problems which were challenging to both beginners and pros and could be scaled into full-blown projects after the Datathon.

Here are the 5 Data Science Project Prompts we came up with. We hope you find an interesting project and would love to see you make a comment with a link to your project after you complete it!

Disclaimer: All of these are mock prompts and are not endorsed by any of the companies or organizations mentioned below.

Airbnb in Seattle & Boston

Image result for airbnb

Seattle and Boston are two of the biggest business and innovation hubs in the country, attracting heavy traffic from tourists and professionals alike. The cities draw people from all walks of life ranging from computer scientists to business owners to startup specialists to tourist groups to college freshmen. Airbnb senses an opportunity to improve their rental programs in these cities and would like to hear your suggestions on how to do so.

Promt:

Airbnb wants you to conduct a study on how they can improve their current rental programs for tourists and visiting professionals in either or both of these cities.

About the Data:

Sourced from: https://www.kaggle.com/airbnb/seattle, https://www.kaggle.com/airbnb/boston

Get Airbnb Seattle & Boston Datasets: https://goo.gl/jcHuwG

  1. Listings: details about each rental property available to customers
  2. Calendar: when and what cost is each listing available
  3. Reviews: reviews left by customers
  4. Additional Datasets: Demographics, Econ State, Real Estate Prices, Venues

Possible Questions to Explore/Ideas:

These questions are for your guidance. We encourage you to look at the data and make questions of your own.

  • Is there an upward trend in new Airbnb listings and total Airbnb visitors to Seattle or Boston?
  • What is the predicted income of an Airbnb listing for the next 3 years?
  • What is the expected demand and supply for Airbnb rental properties in Seattle/Boston required for the next 3 years?
  • Could prices/amenities be improved to help increase customers for a property?

Metrics for European Soccer Leagues

The last 10 years of European soccer have been extremely exciting, with the transfer records being broken on multiple occasions to underdogs showcasing extremely high skill to teams showcasing extreme dominance on the field. As part of their efforts to assist clubs and pundits, Optasports and UEFA are currently building a set of metrics to be used for player, team and league evaluations.

Prompt:

Your challenge is to investigate the available data and develop a metric or metrics and present their application through a results report which demonstrates how it is used.

About the Data:

  1. Top 5 European Leagues: Statistics and Betting odds for the last 10 years of league matches
  2. Fifa Ratings: Statistics of each player for the years 2017, 2018, and 2019
  3. European Database: Detailed records of each player / team / league | Learn more on how to use

Note: Some of these datasets will have more columns with time.

Possible Questions to Explore/Ideas

  • What are the most exciting players/teams/leagues in Europe?
  • What are the trends in the players of teams and leagues?
  • Can we predict the growth of a player based on the league and team they are a part of?
  • Which league is the most suitable for particular type and age group of player?

We recommend reading the column names before developing your metrics.

Celebrating 120 Years of the Olympics

Held every four years, the Olympic Games are considered the world’s foremost sports competition with athletes from more than 200 nations participating in a variety of sporting events. Being the oldest and the grandest sporting event, a large amount of data has been acquired from the games’ history.

Prompt:

As part of their 120 years celebration, the Olympic committee wishes that you publish a mini case study that highlights significant insights and makes recommendations for future events.

About the Data:

Get Data Here: https://goo.gl/dcxmRD

This is a historical dataset on the modern Olympic Games, including all the Games from Athens 1896 to Rio 2016. Note that the Winter and Summer Games were held in the same year up until 1992. After that, the Winter games occurred separately occurring every four years starting with 1994. The two csv files are:

  • Athlete Events: Data on each athlete entry for each competition for each year
  • NOC Regions: The National Olympic Committee Code and it’s region (for map visualizations)

Columns Available: Name, Sex, Age, Height, Weight, Team, NOC (country code), Games, Year, Season, City, Sport, Event, Medal, Region Name, Notes

Possible Questions to Explore/Ideas:

These questions are for your guidance. We encourage you to look at the data and make questions of your own.

  • How has the representation of males and females evolved over time?
  • Does location affect the performance of competitors? (i.e. “home field advantage”)
  • What combinations of height and weight show the best results in different sports?
  • Idea: Develop a metric to evaluate the most exciting Olympic event & country progress

The Age of Kickstarter

Kickstarter is a funding platform where creators can share and gather interest in a particular creative project they’d like to launch. It’s entirely driven by crowdfunding, where the general public and their money is what sends these projects into production. Every project is independently crafted while friends, fans and total strangers offer to fund them in return for rewards or the finished product itself.

Prompt:

Kickstarter wants you to create a study that provides significant insights and helps expose them to and projects and categories they should pay attention to for the upcoming year.

About the Data

Get data here: https://goo.gl/3qASjX

This dataset contains information about over 300,000 Kickstarter projects, with information such as category, goals, and pledges.

Possible Questions to Explore/Ideas

  • Is there a correlation between the goal of the project and its success?
  • How can this data help individuals and startups that wish to launch their idea on Kickstarter?
  • Are there certain types of media more prone to success on the platform?
  • What is the forecast of new projects and funders by category for the upcoming year?

Sentiment Analysis of the WorldCup 2018

The Fifa World Cup 2018, the most prestigious association football tournament, as well as the most widely viewed and followed sporting event in the world, was one of the Top Trending topics frequently on Twitter while ongoing.

Prompt:

Twitter wants a sentiment analysis study which investigates and highlights the different emotions people experienced during the world cup between the Round of 16 and Final.

The Dataset:

Get Data https://www.kaggle.com/rgupta09/world-cup-2018-tweets/home

This dataset contains a random collection of 530k tweets starting from the Round of 16 till the World Cup Final that took place on 15 July 2018 & was won by France.

Possible Questions/Ideas to Explore:

  • Common Patterns and Trends in sentiments expressed for each match
  • Visualization of how sentiments changed between the round of 16 and final
  • What kind of sentiments get the most retweets?

Did you find an interesting project? Did you pursue any of them? Comment down below and follow Data Science Library for more such projects.

--

--