Open Data and it’s potential
This is a series of blog posts from assignments for a class I’m taking — “Leading Trends in Information Technology”. The content has been slightly modified for a blog post.
Amy Tong, in her talk, mentioned that in an effort to utilize technology to better serve Californians, the government of California is joining the Open Data movement with more than 200+ Open Data. I had heard of Open Source and understood it well. However, even though I had heard of Open Data, I did not understand the subtleties and complexities that accompany Open Data.
What is Open Data?
Open Data as defined by the open definition is — “Open data is data that can be freely used, reused and redistributed by anyone — subject only, at most, to the requirement to attribute and sharealike.”
This is basically the idea that data should be freely available to everyone to use and republish as they wish, without restrictions. Not so surprisingly, this philosophy is not new but the term “Open Data” itself is recent. It has gained popularity in recent times with the rise of big data and the success of other Open movements like Open Source.
History of Open Data
“Open Data” as a term was coined in 1995. But long before being given a name, the ideology of Open Data existed in the scientific community.
Robert K. Merton in 1942, in his thesis, explained the importance of sharing results of research and making them accessible to everyone. Scientific researchers were the first to share freely their data. The norm is that a researcher must contribute to the existing set of data. They should also allow reuse of this data without restrictions to encourage innovation and further research, to enable the progress of scientific knowledge.
Although Open Data has its roots in the scientific community, the FOSS (free and open source software) movement helped define Open Data as we know it today.
Open Data in Governance
Open data addresses various political issues. Data.gov — the United States’ Open Data website says it has the following impact — “Open government data is important because the more accessible, discoverable, and usable data is, the more impact it can have. These impacts include, but are not limited to — cost savings, efficiency, fuel for business, improved civic services, informed policy, performance planning, research and scientific discoveries, transparency and accountability, and increased public participation in the democratic dialogue” [4]. There is a lot of potential good that Open Data can do for a nation. Over 70 countries have put out some form of Open Data, and it is only growing.
Not only are there political and public benefits, Open Data from the government can have huge economic potential. Zilla and Garmin are built on Open Data from governments and are huge businesses.
There is a lot of concern about how the data is opened up. It should be in such a way that sensitive and private information cannot be traced back to an individual.
Challenges and Future
Although there has been a large amount of government Open Data put out since the initiative took off in 2007, it hasn’t had as much of an impact as expected. There have been various reasons for this. One is that much of that data are not useful. They might contain no useful information to infer. Another is that some of the metadata is missing. In some cases, the data quality is too bad to be used. There is also a shortage of good data analysts in the market.
While all these contribute to the reason for Open Data to not be as revolutionary as envisioned, privacy remains the biggest concern. Even if the data were to be anonymized, it can be cross-referenced with other Open Data to deanonymize the data. For example, Latanya Sweeney from Carnegie Mellon University linked the anonymized GIC database (which retained the birthdate, sex, and ZIP code of each patient) with voter registration records, and was able to identify the medical record of the governor of Massachusetts [5]. If this were to happen on a large scale it has potential to lead to a severe backlash against Open Data. Research in statistical methods such as Differential Privacy is helping alleviate some of these problems.
Despite these difficulties, Open Data has done a lot of good. Government has been made leaner and more cost effective. It can be used to fight corruption. Bureaucrats working with entrepreneurs and open data activists to solve some of the challenges that Open data faces. Open Data has huge potential and it will be interesting to see how it is applied in the future.
I recently heard of a startup that used Open Data about wind patterns, rains, and population, in tandem with Deep Learning to predict where the next outbreak of Malaria will be. What a time to be alive!