How I aim to use my fellowship to develop an open-source ecosystem of tools for data journalism

I applied for the JSK Fellowship with the following proposal: How might we grow an open-source ecosystem of tools to help data journalists collect, analyze and publish the data underlying their stories?

My starting point for this question was my Datasette open-source project. Datasette is a tool for exploring and publishing data. It provides an interface for exploring small or large datasets, an API for integrating that data into custom applications and a collection of tools for publishing that data to the internet.

I designed Datasette based on my experience working with the Guardian Datablog team at the Guardian from…

Facebook Ads posted by the Russian Internet Research Agency mentioning “cops” ordered by US dollar spend

Two interesting data sources have emerged in the past few weeks concerning the Russian impact on the 2016 US elections.

FiveThirtyEight published nearly 3 million tweets from accounts associated with the Russian “Internet Research Agency” — see my article and searchable tweet archive here.

Separately, the House Intelligence Committee Minority released 3,517 Facebook ads that were reported to have been bought by the Russian Internet Research Agency as a set of redacted PDF files.

Exploring the Russian Facebook Ad spend

The initial data was released as zip files full of PDFs, one of the least friendly formats you can use to publish data.

Ed Summers took…

Example test run showing classes that are missing their documentation

Keeping documentation synchronized with an evolving codebase is difficult. Without extreme discipline, it’s easy for documentation to get out-of-date as new features are added.

One thing that can help is keeping the documentation for a project in the same repository as the code itself. This allows you to construct the ideal commit: one that includes the code change, the updated unit tests AND the accompanying documentation all in the same unit of work.

When combined with a code review system (like Phabricator or GitHub pull requests) this pattern lets you enforce documentation updates as part of the review process: if…

