The hacks, works and perks of open data: what I learned from csv,conf,v2
Necessity has always been the mother of invention. But what is less often mentioned is the role that openness plays in fostering innovation. Exposure is, more often than not, the genesis of most of man’s Eureka moments. Cold and dark? Hello fire. Viruses? Hello vaccines. Bad governance? Hello need for accountability!
At csv,conf,v2 in Berlin in early May, a conglomeration of technologists, scientists and data nerds shared their spreadsheet-based hacks, open data and open science works, highlighting the perks and opportunities that other fans of comma separated value (CSV) files can take up and run with.
I had the opportunity to talk about #DodgyDoctors, a simple, spreadsheet-based tool that Code For Africa built in partnership with The Star, Kenya’s largest blue-collar newspaper, to help citizens make life and death health decisions. The tool has been replicated in Nigeria by Sahara Reporters. You can read all about it here and here.
It is impossible to replicate the conference experience in a blogpost, but here are a few tips and tricks from the sessions I attended. A more detailed breakdown is available here on Twitter.
Here are a few tips and neat tricks on working with CSVs:
- Using d3.js and C++ to create data visualizations by Princiya Marina
- Feather — the new file format for Python and R powered by Apache Arrow and courtesy of Wes Mckinney
- Using wikidata to create wikipedia articles for under-represented languages
- A beginner’s guide to data packages in Python
- A .csv by the United States Ministry of Labour, detailing workplace deaths and explaining, in a sentence, how they happened
- Using open data to map public transportation - including unofficial routes - in Cairo, Egypt by Mohammed Hegazy
- Using CSVs as the master spreadsheet by Richard Jones from cottagelabs.com
- Using Embulk to fight against Chaotically Separated Values by Sadayuki Furuyashi
Here are a few tools that were mentioned at the conference that make working with CSVs a lot easier:
- Data Permissions Catalogue - a set of simple user interface patterns for sharing personal data
- Content Mine - A command-line tool for getting metadata and URLs matching given search queries from science papers
- Structured stories - a tool that allows users to represent events and news with varying levels of detail
- CSV Lint which helps you ensure that your .csv is human-readable
- Stencila Sheets, which is, simply, ‘a spreadsheet file format for humans’. It combines the power of languages like Python and R with the benefits of spreadsheet interfaces
- Binder which turns Github repositories into a collection of interactive notebooks
- Lightning Viz which allows you to turn your data into interactive visualizations using a programming language of your choice
- PySpread - The most Pythonic spreadsheet
A lot of resources were shared at the conference. Here are but a few of the curated lists:
- Wikipedia Tools for Google Spreadsheets by Thomas Steiner
- Jenny Bryan’s notes on hacks, tools and notables from the conference
- Zara Rahman’s slides from here excellent keynote on bridging the gap between tech and activism
- Aurelia Moser’s own notes from her insightful talk on maps
- A summary of The Freeman Lab’s work around open science
By the end of the conference, two things were clear:
a) Two days is not nearly enough to interact with, learn from and exchange knowledge, skill and contacts with great minds from around the world
b) Openness exposes gaps that exist in our world. It does more than present humanity with a need that has to be met: it inspires and challenges communities such as the one at csv,conf to write better code, build better tools, write more concrete stories and open up more data for the good of all.
Challenges posed at csv,conf,v2