Insight Lane — Seven Steps to an Effective Volunteer Data Science Project

Terry Franklin
Data for Democracy
Published in
7 min readAug 21, 2018
Photo by Kevin Ku from Pexels

The team behind Insight Lane: the D4D Crash Model recently released version 1.2 of their project, a significant upgrade that addresses a number of foundational issues in the project’s technology stack that will allow for faster and better development in the future.

While we’re all very proud and excited about the specifics of what we’ve achieved in this release, the aspect that I have enjoyed most is seeing the team work productively and collaboratively using a well-structured plan and effective project management techniques. Planning and project management are sometimes viewed as inconveniences or blockers to getting “real work” done in data science, but our experience has shown them to be fundamental to the success of the project and enjoyment of the team.

Volunteer tech projects have a tendency to follow a well-established arc: an individual or group starts off with a compelling idea. There is great enthusiasm to begin with — chat channels are established and a constant stream of big ideas and interesting articles are shared. Code repositories are created and there is a flurry of initial commits. Sure, a lot of thought hasn’t been put into how the project will develop and eventually reach its goals but hey — we’ve got enthusiasm and we’re building!

Photo by Startup Stock Photos from Pexels

Over time though, that enthusiasm starts to wane. Soon the daily code commits become weekly, then monthly. Difficult decisions get put off because they involve too much work and affect too many areas of the project, which often further slows progress. Existing team members start to drift away and new members aren’t successfully onboarded, because they don’t know where to begin or how to get up to speed.

The Insight Lane team has been active for over a year now and in that time the nature of the project has changed significantly. Originally begun as a collaboration with just one US city, it is now being developed as a tool that can help any city in the world with their mission to achieve safer roads. This change in direction has brought about a need to consider each issue in a much broader context — cities record and store their data in a multitude of different ways, coverage of some data sources is inconsistent and the general complexity of the project has increased dramatically. Our eventual goal is to enable all participating cities to share their insights with each other in advancing the cause of road safety, harnessing the power of collective intelligence rather than operating in isolated silos.

To address the challenges of this new approach while remaining focused and productive, the team has implemented a number of different project management techniques, which may be of use to new teams looking to kickstart their own project. Common to each of them are the ideas of promoting effective communication and continuous progress towards clearly defined goals.

1. Regular Meetings

The Insight Lane team meets every Wednesday at 6pm Eastern for one hour, using Google Hangouts. This time is convenient in that it allows members from very different timezones to attend. We have a rotating schedule for chairing the meeting, which gives everyone a chance to ensure their ideas are heard and discussed. The format can adapt to whatever needs to be addressed that week but always includes important fixtures, such as reviewing tasks in progress and updating on discussions with external groups.

2. Task Tracking

A project is ultimately the sum of its parts, and in a data science project there is bound to be a lot of moving parts! If something is worth discussing then it is usually worth actioning, and trying to organize tasks without a structured process quickly becomes unmanageable. Task tracking is essential for knowing what work is currently in progress, who is working on it and what still needs to be done. Our team has chosen to use GitHub Issues, which integrates nicely with our project code repository and other GitHub features such as pull requests. Tasks are stored in logical lists (To Do, In Progress, Done & On Hold) and a full communication history is easily visible for each.

3. Project Milestones

Organizing our project into release milestones with a clear scope has been a big win for ensuring that we’re always working on things that add value. Again we’ve chosen to go with GitHub for this process, using its “Projects” feature. Often as we approach the end of a release’s scope of work we’ll set a timeframe for completion, which can really help motivate the team to complete their tasks and not delay the release.

4. Free-form Communication

While most of the big issues facing the project are discussed in the regular meetings and using our task tracking software, often a lot of progress can be made in a short period of time between team members working on the same task by using a real-time chat service. New ideas can also be shared to stimulate further discussion about where the project might go next. We have a dedicated project channel via DataForDemocracy’s Slack environment which we use for diving into technical issues, sending test files and general project chat. This is also an important method of fostering a sense of team spirit amongst the group, which is crucial in maintaining engagement and involvement.

5. New User Onboarding Process

Every team can benefit from the arrival of new members, who bring fresh ideas and perspectives as well as additional capacity. Unfortunately once a project reaches a certain level of complexity, it can become difficult for new members to get involved without a lot of time being spent bringing them up to speed, which often leads to a high drop-off rate. To mitigate this, the Insight Lane team invites every new member who joins our channel to onboard a new city using our established pipeline process, developed back in version 1.1. This gets the new member reading our documentation and running the code for themselves, often against their own city’s data, which provides an interesting and relevant starting point. It also helps us discover unknown edge cases in our assumptions because every city is different, as well as identify areas where our documentation isn’t as clear as we thought it was.

6. Code Management & Review

A data science team is made up of individuals, each with their own level of experience and familiarity with the technologies being used. There will be different expectations about what standard of work is acceptable, so using some industry practices and tools can be helpful in establishing a project baseline. Our team has a practice of not merging in code without it first being reviewed, as well as using Travis Continuous Integration to ensure existing functionality isn’t negatively affected. In addition we’ve recently starting using Codecov, a utility designed to provide a simple metric on how much of our code is covered by the automated tests. We try to ensure that this number doesn’t decrease as a result of adding code, by matching new functionality with new tests.

7. Using Appropriate Technologies

Data science teams need to be aware of what new tools are available to ensure they’re working in the best way to achieve their goals. But the technical landscape changes quickly these days and there is no end of cool languages, apps and services available, meaning it’s easy to get pulled into rabbit holes that don’t necessarily move the project forward. Technologies and languages that are in widespread use and have a proven track record are good choices, as are those that allow the reuse of existing modules. Given the data-driven nature of our project Python has been a great choice, with a number of machine learning and data analysis modules helping out. Team members’ computers and setups are likely to be a mixture of standards, so finding ways to work consistently across different environments is also a good idea. Docker containers have helped us a lot with this.

Seven Steps, One Goal

Photo by Kaique Rocha from Pexels

On first glance these techniques all focus on improving the efficiency of the project, but on a more important level they work to promote the enjoyment of the team members. Volunteer projects can be hard work. The modern world constantly pulls at your attention, and finding time to work for free in a role that is often in addition to regular employment or study can be difficult. Having a sense of progress, accomplishment and team spirit can go a long way to making a project enjoyable, which I believe is an important reason why Insight Lane has remained stable and productive for as long as it has. Ultimately any project is only as good as its people, and people are at their best when they are engaged and feel valued.

--

--

Terry Franklin
Data for Democracy

Advancing collective intelligence, one task at a time. Believer in data science for the benefit of all.