10 Kaggle Datasets For Learning Python And Data Science

David Miller
2 min readSep 12, 2022

--

Photo by Joshua Sortino on Unsplash

I changed my career from accounting to data science without ever using Kaggle.

This was stupid.

Don’t make my mistake — use these 10 awesome datasets (including thoughts on what you can do with them).

1 — Amazon Reviews

  • Calculate basic product analytics
  • Use clustering algorithms to group products
  • Endless NLP use cases: sentiment analysis, keyword extraction, summarization

2 — IBM Credit Card Fraud

  • Develop framework for imbalanced data
  • Build supervised ML model to predict fraud
  • Use clustering algorithms to group consumers

3 — Soccer

  • Create dashboard of advanced analytics
  • Build a supervised ML model to predict outcomes
  • Use clustering algorithms to group players

4 — World Food Facts

  • Create dashboard of advanced analytics
  • Build a time-series model to forecast prices
  • Use clustering algorithms to group products

5 — Spotify Song Lyrics

  • Create unique analytics using audio features
  • Build neural networks to predict hits
  • Use clustering algorithms to create sub-genres

6 — AMEX Default Prediction

  • Build pipeline to handle large data
  • Design sampling approach for imbalanced data
  • Build supervised ML model to predict fraud

Use what you learned on IBM’s dataset and apply it to big data!

7 — Home Prices

  • Calculate suite of analytics
  • Create geospatial dashboard
  • Build supervised ML model to predict prices

8 — FourSquare Location Matching

  • Manipulate unstructured geospatial data
  • Build geospatial analytics
  • Use advanced clustering to group locations

9 — H&M Fashion

  • Build pipeline to handle large data
  • Calculate advanced analytics
  • Create recommendation algorithm

10 — Crypto Market

  • Calculate advanced analytics
  • Build time-series forecasting model
  • Build supervised ML model to predict prices

--

--

David Miller

Accountant → Data Scientist | Writing about the business of data science. Helping you create impact with data and machine learning.