10 Kaggle Datasets For Learning Python And Data Science
2 min readSep 12, 2022
I changed my career from accounting to data science without ever using Kaggle.
This was stupid.
Don’t make my mistake — use these 10 awesome datasets (including thoughts on what you can do with them).
1 — Amazon Reviews
- Calculate basic product analytics
- Use clustering algorithms to group products
- Endless NLP use cases: sentiment analysis, keyword extraction, summarization
- Develop framework for imbalanced data
- Build supervised ML model to predict fraud
- Use clustering algorithms to group consumers
3 — Soccer
- Create dashboard of advanced analytics
- Build a supervised ML model to predict outcomes
- Use clustering algorithms to group players
4 — World Food Facts
- Create dashboard of advanced analytics
- Build a time-series model to forecast prices
- Use clustering algorithms to group products
- Create unique analytics using audio features
- Build neural networks to predict hits
- Use clustering algorithms to create sub-genres
- Build pipeline to handle large data
- Design sampling approach for imbalanced data
- Build supervised ML model to predict fraud
Use what you learned on IBM’s dataset and apply it to big data!
7 — Home Prices
- Calculate suite of analytics
- Create geospatial dashboard
- Build supervised ML model to predict prices
8 — FourSquare Location Matching
- Manipulate unstructured geospatial data
- Build geospatial analytics
- Use advanced clustering to group locations
9 — H&M Fashion
- Build pipeline to handle large data
- Calculate advanced analytics
- Create recommendation algorithm
10 — Crypto Market
- Calculate advanced analytics
- Build time-series forecasting model
- Build supervised ML model to predict prices