by freepik

Where can I found Open Datasets?

Huseyin Elci
Analytics Vidhya
Published in
7 min readApr 16, 2021

--

So here’s tons of awesome Open Data sources list

What is Open Data/Dataset?

In 2017, The Economist had mentioned, “‘Data’ is the new ‘oil’ of our age”. So what does it mean for a commodity as valuable as oil to be ‘open’?

In simple words, Open Data means the kind of data which is open for everyone for access, modification, sharing, and reuse with any charge off.

Governments, private agencies, and independent organizations, have come forward of data to create more and more open data for free and easy access.

Why Is Open Data Important?

The world grows increasingly data-driven. But if there are restrictions and delimitation on the access and use of data, the investigation and discovery of data-driven business and governance will not be achieved.

Wherefore, Open data is very important for scientific and statistical studies.

Open data is very important because it can allow a fuller understanding of global problems and universal issues. It provides great benefits to businesses as it facilitates access to data.

Where can I found Open Datasets?

There is an intense interest in data science and big data fields. I want to share my dataset site archive with a flood.

You can add what you know and what you like.

Open Data sources list

Data scientists and machine learning practitioners need Open Data web page below which I bookmarked to my explorer. So here’s tons of awesome Open Data sources list:

  1. World Bank Open Data : As a repository of the world’s most comprehensive data regarding what’s happening in different countries across the world, World Bank Open Data is a vital source of Open Data. It allows you to download data in different formats such as CSV, Excel, and XML over 3000 datasets and 14000 indicators encompassing microdata, time series statistics, and geospatial data.
  2. WHO (World Health Organization): WHO’s Open Data repository contains specific health statistics provided by 194 Member States.
  3. European Union Open Data Portal : You can access whatever open data EU institutions, agencies and other organizations publish on a single platform namely European Union Open Data Portal.
  4. Google Public Data Explorer : Google Public Data Explorer can help you explore vast amounts of public-interest datasets. You can visualize and communicate the data for your respective uses.
  5. FiveThirtyEight : It provides its various sources of data for a variety of sectors such as politics, sports, science, economics etc. You can download the data as well.
  6. U.S. Census Bureau : U.S. Census Bureau is the biggest statistical agency of the federal government. It stores and provides reliable facts and data regarding people, places, and economy of America. The Census Bureau considers its noble mission to extend its services as the most reliable provider of quality data.
  7. Open Data on AWS : This registry exists to help people discover and share datasets that are available via AWS resources.
  8. Data.gov : Data.gov is the treasure-house of US government’s open data. It was only recently that the decision was made to make all government data available for free.
  9. DBpedia : DBpedia aims at getting structured content from the valuable information that Wikipedia created. With DBpedia, you can semantically search and explore relationships and properties of Wikipedia resource. This includes links to other related datasets as well.
  10. freeCodeCamp Open Data : It is an open source community. Why it matters is because it enables you to code, build pro bono projects after nonprofits and grab a job as a developer.
  11. Yelp Open Datasets : The Yelp dataset is basically a subset of nothing but our own businesses, reviews and user data for use in personal, educational and academic pursuits. There are 5,996,996 reviews, 188,593 businesses, 280,991 pictures and 10 metropolitan areas included in Yelp Open Datasets.
  12. UNICEF Dataset : Since UNICEF concerns itself with a wide variety of critical issues, it has compiled relevant data on education, child labor, child disability, child mortality, maternal mortality, water and sanitation, low birth-weight, antenatal care, pneumonia, malaria, iodine deficiency disorder, female genital mutilation/cutting, and adolescents.
  13. KAGGLE : Kaggle is great because it promotes the use of different dataset publication formats. However, the better part is that it strongly recommends that the dataset publishers share their data in an accessible, non-proprietary format.
  14. LODUM : It is the Open Data initiative of the University of Münster. Under this initiative, it is made possible for anyone to access any public information about the university in machine-readable formats. You can easily access and reuse it as per your needs.
  15. UCI Machine Learning Repository : It serves as a comprehensive repository of databases, domain theories, and data generators that are used by the machine learning community for the empirical analysis of machine learning algorithms. In this repository, there are, at present, 463 datasets as a service to the machine learning community. The Center for Machine Learning and Intelligent Systems at the University of California, Irvine hosts and maintains it. David Aha had originally created it as a graduate student at UC Irvine.
  16. Dataverse : The Dataverse Project is an open source web application to share, preserve, cite, explore, and analyze research data. It facilitates making data available to others, and allows you to replicate others’ work more easily. Researchers, journals, data authors, publishers, data distributors, and affiliated institutions all receive academic credit and web visibility.
  17. Open Data Monitor : As a platform, it gives visitors an overview of available open data resources, allowing them to analyse and visualise existing data catalogues using innovative technologies
  18. CKAN : Ckan is a powerful data management system that makes data accessible — by providing tools to streamline publishing, sharing, finding and using data.
  19. Open Data Impact Map : The Open Data Impact Map is a public database of organizations that use open government data from around the world. Open Data is publicly available data that can be accessed and reused by anyone free of charge.
  20. Awesome Public Datasets : A github repository with hundreds of datasets where Open data is compiled
  21. Data.gov.in : It is the site with open data of the government of India.
  22. Quandl : API ‘ler ile sürekli veri çekmek nasıl fikir ? Quandl üzerinden ücretsiz hesap oluşturarak API bağlantınızı kurabilir ve real time streaming projeler geliştirebilirsiniz
  23. Socrata : It is a site where US data such as data.gov is shared. But you can access many different data sets. You can analyze the salaries of employees in the US White House.
  24. Datahack by Analyticsvidhya : You can join hackathons that can make data on a world scale, review past hackathons and download them in resin.
  25. Academic Torrents : It contains datasets that are used in academic articles. You can find various data such as 5 TB IHA flight data or 1 TB photos.
  26. Data Is Plural : With subscription, it regularly sends data sets directly to your e-mail address. Thus, he determines what you will be working on that week.
  27. Dataset by Reddit : It is a part in Reddit where data set shares or requests are made.
  28. Dataset by Stanford University : It is the site of Stanford University projects and datasets on social networks.
  29. Network Repository : There is a large repository of datasets on social networks and web graphics.
  30. Data Europa : It is the official website of the European Union with datasets, applications and images in many fields collected from European Union countries.
  31. Dataset by IMF : The International Monetary Fund (IMF) publishes a lot of data such as the financial status of countries, debt ratios, currency reserves, commodity prices.
  32. Lion Bridge AI : There are many free and paid machine learning datasets. Data sets are available in many areas such as voice, crypto money.
  33. Dataset by Open Intro : An organization created by a group of educators with data sets and more in textbooks, although the categorization is a bit complicated.
  34. News Dataset by BuzzFeedNews : It makes available the datasets mentioned in the articles in the news via github by BuzzFeedNews.
  35. Dataset by NASA : Are you a space science enthusiast? You can work with data from NASA’s open data portal.
  36. Text Dataset By Wikimedia : Would you like to work with Wikipedia data, our new digital encyclopedia? You are looking for a multilingual dataset for your NLP project, this is website for you.
  37. DataWorld : Data World, which defines itself as ‘the social network for data people’, is a dataset search engine where you can search and download datasets.
  38. Wunder Ground : Do you need instant weather data? You can find historical or future (forecast) weather data in either API or csv format.
  39. Data from Youtube : The YouTube-8M Segments dataset is an extension of the YouTube-8M dataset with human-verified segment annotations. In addition to annotating videos, we would like to temporally localize the entities in the videos, i.e., find out when the entities occur.
  40. Your Personel Data on Facebook : Would you like to analyze your personal data with Facebook? First, download your data from here then check out here example

Conclusion

Open data is the force of the day. The world has gradually started moving towards open systems. Why don’t you get advantage of them?

Author

Huseyin ELCI | Github | Kaggle | Linkedin |

--

--