D4D’s First Remote Hackathon

Or: How You Can Get Started with Data For Democracy

Data For Democracy is a large community of several hundred collaborators working remotely across numerous projects. Diving in may initially seem daunting, but we’re always creating structured, welcoming opportunities for new people to get involved in our work! One of our recent successes on this front was a remote hackathon organized by the Assemble team.

The Assemble team’s primary goal is to help other data science projects develop the tools and infrastructure they need to accomplish their exciting work. There are several projects which fit under the Assemble umbrella, one of which is the collect-social project.

We’ve seen a lot of interesting social media projects stumble at the data collection phase. Collect-social is our attempt to avert this by standardizing everything from content collection to storage, thus allowing our analysts to focus on research and not have to waste time dealing with repetitive tasks like setting up databases, fiddling with social media APIs, or boilerplate code and configuration. We have a long way to go, but so far we’ve helped to build a few collections of social media data, including tweets from #NoBanNoWall protests and the #womensmarch on Washington.

As we’ve grown, one thing that has been a real challenge is figuring out how best to communicate next steps, define requirements, and create digestible tasks. We are a group of volunteers working remotely, who come and go as life and other obligations get in the way. While many people are willing to contribute, it can be difficult to tell exactly where to jump in. Recently, we resolved some of these questions through a spontaneous event, which we retroactively named a “remote hackathon.”

This event came together when one of our ambassadors, Jorge, started a channel to help people with the Twitter issues we were working on at the time. After several people joined the channel and asked how they could help, we quickly realized the first step was to define exactly what we wanted to work on — as is the case with many open-source projects, documentation was a little sparse and our requirements were not well defined. The group worked collectively to produce a high-level design document, which was then published to everyone involved.

Once that was complete, one of our maintainers, Michael, took this design document and broke it down into a kanban board of smaller tasks. People volunteered to take on an issue whenever they had a few hours to spare. Using Slack, volunteers kept the team updated on their progress, asked questions, and gave each other help. By Sunday night, we had reached our goal of closing all our issues.

While this flow may sound obvious to those who have experience with focused sprints in a corporate setting, it’s somewhat more challenging to get started when the team is a group of remote volunteers — we have no boss, no business, and no obligation for any of us to work on any of it!

Despite these hurdles, the weekend was a great success. We had 22 code commits from 6 different people, and our contributors ranged from experienced developers to open-source first-timers.

The GitHub stats

One first-time D4D contributor, Caitlin, quietly knocked out three substantial tasks, and only later did we realize it was her first week in D4D! Given how quickly she dove straight in, we decided to get her advice on the best ways for newcomers to get started. (We also made her a project ambassador to help with onboarding new people interested in Assemble — this kind of empowerment is good practice for any smart open-source project.)

Q: How did you find out about D4D and what initially drew you to the assemble/collect-social project?

A: I had been stewing quite a bit since the election and had been trying to figure out how to best do my part for democracy. I’m not a huge “phone-talker” or letter-writer, but I do love data. One of my friends from grad school forwarded me the link to D4D, and it sounded like the perfect place for me to, as cliched as it sounds, “make a difference.”

I was drawn to the assemble/collect-social project because the first step to making some data-driven insights is, of course, collecting good data! With the huge role social media has been playing in the [inter]national conversation lately, and the vast amount of currently untapped data out there, it seemed like a perfect place to start.

Q: What was the hardest part about making your first contribution?

A: Brushing up on git. It had been a while since I used it, and I somehow always end up with merge conflicts! I came in with some GitHub PTSD, admittedly. But D4D’s tutorials over at the “github-playground” repo were really helpful. Plus not having write privileges on the main project repo ensured that someone else would be checking my code before merging, thus averting any potential disasters.

Q: What was advice do you have for first time contributors?

A: Talk to people (like me! I’m an Assemble project ambassador now!). Don’t be afraid to ask questions. Use the tutorials. Take your time. As far as I can tell, the people at D4D are super friendly and eager to help out!

Q: Any calls to action or other thoughts you have about what our community can do to support each other, especially first time contributors?

A: The extensive documentation on GitHub was a huge help to me; I was able to read about all of the projects, go through tutorials, and generally learn just enough to pretend that I knew what I was doing during the hackathon. As the community grows, this kind of documentation — for projects, D4D general infrastructure, best practices, tutorials, etc — will be essential. The constant barrage of messages on Slack can sometimes be overwhelming! Having more little “hackathons”/breakouts I think will help people get to know each other better as they work on concrete, well defined goals. It’s easier to digest as a newbie!

If you’re interested in future events of this sort, there will be more — join the D4D Slack team and keep an eye out for announcements!