Ultimate DataSet Resource Hub

Start Playing with Data Today!

Zoshua Colah
Data Science Library
3 min readNov 4, 2018

--

Source: https://journeyofanalytics.files.wordpress.com/2016/01/cloud-1.png?w=676&h=434

Kaggle

Kaggle has come up with a platform, where people can donate datasets and other community members can vote and run Kernel / scripts on them.

World Bank

The open data from the World bank. The platform provides several tools like Open Data Catalog, world development indices, education indices etc.

Five Thirty Eight Datasets

Here is a link to datasets used by Five Thirty Eight in their stories. Each dataset includes the data, a dictionary explaining the data and the link to the story carried out by Five Thirty Eight.

Amazon Web Services (AWS) datasets

Amazon provides a few big datasets, which can be used on their platform or on your local computers.

Google datasets

Google provides a few datasets as part of its Big Query tool. This includes baby names, data from GitHub public repositories, all stories & comments from Hacker News etc.

Youtube labeled Video Dataset

A few months back, Google Research Group released YouTube labeled dataset, which consists of 8 million YouTube video IDs and associated labels from 4800 visual entities. It comes with pre-computed, state-of-the-art vision features from billions of frames.

Quandl

Quandl provides financial, economic and alternative data from various sources through their website / API or direct integration with a few tools. Their datasets are classified as Open or Premium.

Driven Data

Driven Data finds real-world challenges where data science can be used to create a positive social impact

Data Packaged Core DataSets

Important, commonly-used datasets in high quality, easy-to-use & open form as data packages

Archive Team

Old archives of websites that no longer exist. Includes data on the affinities of 60,000+ Reddit users

Reddit DataSets

Datasets and requests for datasets

Dataportals.org

provides a comprehensive list of open data portals

Awesome Public Data Sets

A github repo of clean datasets — lots of variety

R Bloggers

Datasets to practice data mining on

DataHub

great dataset resource

Enigma

the world’s broadest collection of public data

Global Open Data Index

The Global Open Data Index (GODI) is the annual global benchmark for publication of open government data

Knoema

Access data and statistics instantly with smart search.

Open DataSoft

A comprehensive list of 2600+ Open Data portals around the world

Linking Open Government DataLinking Open Government Data

IGSR: The International Genome Sample Resource

The 1000 Genomes Project ran between 2008 and 2015, creating the largest public catalogue of human variation and genotype data.

Country Datasets coming up

data.gov

This is the home of the U.S. Government’s open data. The site contains more than 190,000 data points at time of publishing. These datasets vary from data about climate, education, energy, Finance and many more areas.

data.gov.in

This is the home of the Indian Government’s open data. Find data by various industries, climate, health care etc.

https://data.gov.uk/

Find data published by the United Kingdom central government, local authorities and public bodies to help you build products and services

https://data.europa.eu/euodp/en/home

The European Union Open Data Portal (EU ODP) gives you access to open data published by EU institutions and bodies.

If you liked this article, please give us a clap. Thank you for reading.

--

--