A Data Science Journey with D4D

Or: 5 Things I’ve Learned from D4D (That You Can Too)

George Richardson
Data for Democracy
4 min readMar 7, 2017

--

I recently recorded an interview for the podcast, Partially Derivative, about the Internal Displacement project in Data for Democracy. The process made me think more about the group and the way that it is shaping my own transition into data science.

Like many of the group’s members, until recently I wouldn’t consider myself a “data” person, though I did spend my time working with data. Last year I finished a PhD on photonic structures, which involved lots of information from optical measurements and microscopy techniques. Somewhere along the line, I realised that I would rather be working on more immediate, real-world problems, but I also liked plotting data and transforming it into insights. I started to hear abut this data science thing, got excited about what it could do for the world, and spent the next few months after graduation diving into a new exciting toolkit.

I always suspected that there were other data scientists looking to use their skills for impactful projects, but I’d never managed to stumble across them. Then D4D came along. Growing almost by the hour, the group is an inspiring assembly of talented data professionals and academics, domain experts, and spirited beginners, all looking to use data in a genuinely useful and considerate way for humanity.

Within a few weeks of joining, encouraged by the supportive nature and openness of the group, I volunteered to lead the Internal Displacement project, with the aim of building a tool to track online reports of internally displaced people (as a contribution to the #IDETECT challenge). It’s been enlightening and informative, and below are a few highlights of the things I’ve learned in the short space of time since.

Reflections

1. One’s Company
Between a few collaborative projects and supervising undergraduate students, my graduate research mostly involved making samples on my own, sitting in a room with lasers on my own, analysing data on my own and then figuring out why it was all going wrong, mostly on my own. While the ability to work independently should be encouraged, working in the right team brings diverse approaches to problems, better productivity and can just be a lot more enjoyable. D4D in general and the Internal Displacement project group have reinforced that whatever I do next in my career, I’d like it to be part of a team effort. Having the opportunity to work and brainstorm with other people is fun, motivating and accelerates my learning.

2. Data, Data, Everywhere
Ironically for someone who spent over four years painstakingly manipulating small pieces of glass in the hope of getting interesting optical data to study, I was under the impression that when you became a data scientist, there would be data lining up at the door just to get some of your analytical action. The way data science is often portrayed makes it sound as if we are floating in a sea of information brimming with insights. We are, but when you want to work on a really cool data project, you probably have to actually go out there and turn that information into useable data. Like much of the work in D4D, the Internal Displacement project relies pretty heavily on data engineering, pipelines and database preparation before actually getting to any analysis.

3. Got to Git it On
I started using git seriously for my personal projects after someone told me “you are your own best collaborator, but when something breaks, you from 6 months ago is terrible at answering emails”. Before leading a project, I had little experience of the git workflow beyond version control for small bits of individual work. Surely doing the same thing in a team of remote collaborators would be simple! Cue gitastrophe.

For the first week of the Internal Displacement project, the history of our GitHub repository closely resembled a pit of snakes. The result: learning a collaborative GitHub workflow and discovering some best practices the hard way.

4. The Curse of Middle Management
I get energised by coordinating projects and facilitating other people’s great work, but it can sometimes mean finding time to contribute to the actual codebase myself is harder. Unfortunately, less actual coding can lead to losing track of the technical path that the project is taking, making it more difficult to organise in the first place. I’m definitely still learning how to balance these roles and be effective on both fronts.

5. Boarding a Moving Vehicle
The energy unleashed by D4D has been like opening a box of tightly coiled springs. New members are continually joining projects with skills and zeal. Providing a clear and concise, but detailed overview of the project’s aims and methodology to harness their abilities, while ensuring that the core project activities continue is crucial. We’re still learning how to do this in Internal Displacement, but getting it right feels really important as it’s an issue that crops up in many organisations with transient communities of workers and volunteers.

Data for the Future

Technology will only ever solve a part of humanity’s challenges, but the work that is being done inside D4D, and the way it is being driven by the community, is inspiring and will surely provide some of the solutions. It is one of those elusive bodies of people that come together with their free time and somehow create the perfect conditions for a hive of productivity, lively discussion and innovative thinking. I hope that the group manages to transform this wave of initial enthusiasm into long-term successes and partnerships, and in turn expands the movement of data driven work into the gaps where the world really needs it.

Failing that, at the very least it will be remembered as having the finest selection of emojis ever used among an online alliance of data nerds this side of the galaxy.

--

--