The Role of Open Source and Open Data in Responsible Data Stewardship

Andrew Young
Oct 23 · 2 min read

Data stewards across industries have used a wide array of operational and technical means to enable responsible re-use of private-sector data and to create public value through the creation of data collaboratives. A key function of data stewardship, in fact, involves identifying a fit-for-purpose approach for partnership and community engagement.

Some data collaboratives are highly cooperative and maintain restrictive data access controls—the Gender Gaps in Urban Mobility project launched by The GovLab, UNICEF, Universidad del Desarrollo, Telefónica R&D Center, ISI Foundation, and DigitalGlobe with support from Data2X, is one such example. Among other datasets, that project leveraged call detail records (CDRs) that were analyzed in a secure environment by specific, pre-selected partners. This approach was the most responsible and feasible pathway to gain insight into the gendered aspects of urban mobility in Santiago, Chile, and to assess how those insights could inform more inclusive urban planning.

Other data collaboratives are built on less sensitive data and do not require equally restrictive data access controls. A recent post from Hal Varian, Google’s Chief Economist, reflects on these more open approaches available to data stewards for unlocking the public value of their company’s private-sector data. Varian specifically describes Google’s efforts to make its research, data, and code as “universally accessible and useful” as possible.

From the post, Open Source and Open Data:

“There’s currently an ongoing debate about the value of data and whether internet companies should do more to share their data with others. At Google we’ve long believed that open data and open source are good not only for us and our industry, but also benefit the world at large.

Our commitment to open source and open data has led us to share datasets, services and software with everyone. For example, Google released the Open Images dataset of 36.5 million images containing nearly 20,000 categories of human-labeled objects. With this data, computer vision researchers can train image recognition systems. Similarly, the millions of annotated videos in the YouTube-8M collection can be used to train video recognition.”

Read more here.

Data Stewards Network

Responsible Data Leadership to Address the Challenges of the 21st Century

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade