Top 100 Open Source Datasets for Data Science
--
Datasets for Categories: Computer Vision, NLP, Reinforcement Learning, Deep Learning etc.
1. Quandl
It is a massive repository for Economic and Financial data. Most of the datasets are free but some are available to purchase as well.
2. Academic Torrents
It has data used to publish scientific research papers. The variety of datasets is massive with availability of free download.
3. Data.gov
It consists of a variety of datasets from US Government agencies. Domains include Education, Climate, Food, Chronic disease and what not.
Link: https://www.data.gov/
4. UCI Machine Learning Repository
This site consists of datasets hosted by the University of California, Irvine. It has a collection of about 400+ datasets aimed towards the Machine Learning community.
5. Google Public Datasets
Google has hosted tons of datasets on Google Public Datasets which is basically their Cloud Platform. You can browse through their dataset collection using BigQuery. The first 1 Terabyte of queries you make are basically free.
6. Datasets on Github
It hosts tons of awesome datasets. This github boasts a variety of datasets such as Climate Data, Time Series data, Plane crash data etc. Feel free to dig in.
Link: https://github.com/awesomedata/awesome-public-datasets
7. Socrata
Socrata hosts cleaned datasets across domains such as Government data, Radiation data, Workplace related data etc.
8. Kaggle datasets
Kaggle is a house-hold name by now amongst data professionals. Kaggle hosts massive open source…