The Open Data Revolution — What It Is and Where You Can Find It

Jasmin Kareem
The Outlier by Pattern
8 min readMay 7, 2021

--

You may or may not have heard, but there’s an open data revolution happening right now and it could help data scientists solve a range of societal problems. In this article I will talk about what open data is and its value to society. I’ll also mention where you can find it, in particular in the Netherlands and the EU. Lastly, I will show you some open data examples and use cases, as well as explaining what you, as a data scientist, can do to make an impact in this open data revolution!

In the field of data science, we often talk about the value of big data. The phrase “data is the new oil” has probably been used in every introductory lecture on data science for the past decade or so. And well, you can’t really blame them for using it so often. Data has been rocket fuel for the field of Artificial Intelligence, which in turn has led to amazing advancements in cancer detection, agriculture, combating plastic pollution, and much, much more.

Whilst the ability to use data to innovate and solve societal problems has been amazing, it has also been limited to the data available to researchers and innovators. More data is being collected than ever before, but access to that data is not always possible and this lack of access could be a barrier for data reaching its full potential and thus hinders innovation and progress. And this is where the idea of open data is important.

What is ‘Open Data’?

According to the definition written on the EU’s data portal website, open data is data that anyone can access, use and share without restrictions. This definition seems quite broad at first, but in reality, there are many ways data can be open and, therefore, a broad definition makes sense. To be more clear, open data can be crowdsourced, but it can also come from governments, research, or private organizations. An example of crowdsourced open data would be OpenStreetMap which provides map data and is built by a community of contributors. In terms of open data in science, some portals store open data such as the DataverseNL, which includes research data from all Dutch universities. Open data is less common in the private sphere but examples do exist such as Facebook’s Data for Good initiative which includes anonymized datasets on social connectedness.

Most of the time when people talk about open data they actually mean Open Government Data which refers to information collected, produced, or paid for by public bodies. Since open government data directly comes from the public and is paid for by the public, it’s easy to see why these datasets could be useful to publish. For example, in the Netherlands, the Central Bureau of Statistics releases annual data on topics like the economy, the environment, and education.

So what’s the value?

At the beginning of this article, I briefly mentioned the ability to solve societal problems as being an advantage of giving people access to data. But there are many more reasons why open data has value. For one, open government data that is transparent and clear can help citizens make better-informed decisions in their daily lives and in times of elections, leading to a healthier democracy. Open data also contributes to the economic growth of a country! According to a 2020 report from the EU, the open data market size (which is the market size of products, services, and content improved or enabled by open data) is forecast to reach between €199.51 and €334.21 billion in 2025. This is partially attributed to the re-use of open data in organizations. Open data can also help startups thrive by giving them an equal playing field on which to start on.

The inner skeptic in you may wonder if open data (being sourced from individuals and being accessible to all) is an antagonist to individual privacy, but that’s not the case. Privacy and data protection are necessary aspects of open data if it is to exist. Moreover, there are ways to ensure that the privacy of the individual is protected, such as data aggregation or the anonymization of data. This process does not necessarily have to decrease the value of the data! High-value, open, and privacy-sensitive datasets do exist, such as the datasets from the Dutch Ministry of Education on the number of graduates per year and per demographic or the Uber Movement datasets that cover travel times, speeds per street, and mobility of people in cities.

Whilst there is a lot of value in embracing open data, there are still some challenges ahead. Not all open datasets have a high value. If you’re a data scientist reading this you probably know all too well the horrors of working on incomprehensible datasets. The rigorous process of data cleaning can be a chore. Not only this, but some datasets only have so much to tell you (those ones are usually stored in a very strangely formatted Excel file). For this reason, best practices on creating open data must be shared amongst organizations to maximize the value that open data does have.

Another challenge is that whilst data is openly available, it is often hard to find for the average person since open data can be scattered across the internet, and even if they do find it, interpreting the data can be difficult for most citizens. But this can be improved with clear, easy-to-use, and transparent data visualizations, dashboards, and open data portals!

An open data dashboard on home relocation flows from the Central Bureau of Statistics Netherlands

The Open Data Revolution

Now that we’ve covered what open data is and the value of open data, we can also discuss all the fun and impactful applications of open data and what has recently been happening with open data in Europe.

So I mentioned the EU a few times already but recent developments have been quite important as they’ve announced in their 2020 data strategy that their vision for the future is to create a single market or a single European data space for open data to exist. If successful, it would be great for innovation in the region as there is more data accessible as well as more standardized and higher quality data. And in some ways, they have already done that with their EU data portal that sources openly available datasets from all member states under one platform.

Open data initiatives are also coming from the private sector. Previously I mentioned examples from Facebook and Uber, but recently Microsoft also announced its Open Data Campaign to help address the “data divide” so that small-to-medium-sized organizations have greater access to data and then can also contribute to finding solutions to societal problems with that data. This move by both governments and private organizations towards the opening of data shows that they believe in the value it adds for society and the potential for the future.

Alright so with that in mind, let’s look at some real-life use cases of open data. One that comes to mind is BlindSquare which uses OpenStreetMap data to aid the blind and visually impaired in getting around by using voice controls to give detailed descriptions of points of interest, intersections, and streets to ensure reliable and safe travel for the user.

What’s crazy is that BlindSquare is really able to make a difference in people’s lives using open data. Another example I found that demonstrates the potential of open data is a medium article about visualizing the crowdedness of Dutch trains using only open data.

Kepler visualization of train occupancy from Leo van der Meulen

I won’t go too much into this application, but if you’re interested in the code you can read the article for yourself. This is another great advantage of open data! Not only can anyone access the data and come up with their way of using the data to gain insight into a topic, but they can also share code for implementations with others. In this case, the datasets used were from NDOV Loket and openOV.

There are many more use cases and applications to be found online that you can look up in your own time to get a better idea of what’s possible with open data. If you’re interested in being a part of this open revolution, there are many places you can go and things you can do to make an impact yourself!

What you can do to make an impact

There are plenty of ways you, someone with data science expertise, can make an impact. The simplest thing you can do is browse the datasets available online to get inspired. UNICEF and the World Bank offer a lot of open datasets relevant to solving societal problems. Like I said earlier, datasets are scattered across the internet, so it may take a while to get an idea of where you can get what. Open data portals such as the EU data portal and the Dutch data.overheid are also great for finding data but you won’t always find everything there. More topic-specific portals such as Systema Naturae (a portal dedicated to open wildlife data) could make it easier for people to find data specific to that topic. OpenStreetMap is also a versatile open database and can be used in many different applications.

Another great place to go to make an impact is to participate in hackathons. The EU is recently organizing their first EU open data days to bring people together with open data. There is also an EU-wide Datathon centered around solutions using open data!

There is one more point I would like to make before I end this article. I believe that the idea of open data really comes down to building a strong community. I’ve already mentioned OpenStreetMap being crowdsourced (and that of course relies on a strong community) but this combination of open data and community is almost essential for open data’s success. You see, if you’re working on an open data project, you’ve completed it and now you want to work on another project, what will make sure that your first project still makes an impact and doesn’t just die in your GitHub repo? The best way to keep open data solutions and projects alive is through a community! In the case of the Netherlands, data.overheid created a data communities forum recently to do just that. What’s great about it is that it's sort of like Kaggle but then just Dutch and more ‘gezellig’ as the Dutch would say.

And speaking of Kaggle (this one’s probably obvious to you), have a look there as well. If you are interested in making an impact through data science projects make sure to remind yourself to look at what datasets appear every once in a while. And maybe even ask some friends to join in your open data endeavor. Making an impact alone can be daunting, but together in a community, anything is possible!

--

--