Which Data Science Event is Right for You?

Radiant Earth
Radiant Earth Insights
8 min readMar 30, 2020

By Anne Hale Miglarese, Founder & CEO, Radiant Earth Foundation

As data-driven approaches become more integrated with global development missions like food security, practitioners need to stay up to date with data science methodologies and best practices. Trying to keep up with this rapidly evolving field can feel overwhelming. Luckily, there are numerous capacity-building data science events that provide a great way to stay abreast of new trends and developments.

Data science events are all structured in a similar participant-driven style, but each type has key differences and lends itself to distinct data science skillset and goals. In this article, we’ll differentiate between these skill sets, and highlight the different kinds of data science events that could benefit your missions.

Code Sprints

Also known as a design sprint, code sprints refer to a time-boxed period that groups of software developers have to complete several tasks, such as writing code, documenting APIs, among other technical writing. Developers typically aim to have a new functioning prototype ready for release by the end of the sprint. Based on the Scrum model, an agile software development framework, sprints have since been adopted in a wide variety of disciplines. Designed to encourage teams to self-organize, collaborate, and solve complex problems (Figure 1), this time-boxed process allows developers to hone in on issues and expedite releases.

Figure 1: A Basic Sketched Structure of the Scrum Framework for Collaborating on Complex Projects (credit.)

The duration of a sprint varies by project and data domain. Each sprint can range from two days to a month, has a definitive end goal, and is part of an iterative development cycle outlined in Figure 2. Sprints can be internally run within an organization or can be open to external participants.

Figure 2: Iterative Sprint Structure (credit).

Skills needed to participate in sprints

Since sprints are formal events, participants need to be technically proficient with software development and version control systems like Git. Participants should also be familiar with the theme of the sprint and data domain. While sprints are a time for experts to work on a product, they can also be an excellent introduction for new community members to dive headfirst into the project.

What will you get out of participating in a code sprint?

Sprints are a great place to become familiar with the ins and outs of your product and gain extensive experience in issue tracking, version control, and implementation. Additionally, you will learn how to collaborate and work under a hard deadline. Sprints are also a useful way to practice breaking down complex ideas and tasks into manageable pieces. Examples of sprints focusing on Earth observation include the OGC — API Sprints and the SpatioTemporal Asset Catalog (STAC) Sprints, of which the latest one recently released the STAC 0.9.0 version.

Hackathons

A combination of “hacking” and “marathon,” hackathons are sprint style events in which participants work around the clock within the allotted time. Participants at a hackathon break off into small teams that each focus on one aspect of the overarching project. The goal is for each team to create a functioning deliverable to present at the end of the hackathon. These events generally center around one theme, such as a programming language, API, or data domain, or, on a problem.

Figure 3: Basic Hackathon Structure (credit).

Hackathons are typically more open-ended and casual than a sprint and have more participants as they welcome participants of all skill levels and disciplines. Hackathon teams work on multiple aspects of a project simultaneously compared to sprints, where participants address one facet of the project at a time.

Skills needed to participate in hackathons

Hackathons are open to data scientists and programmers of all levels, including beginners. Many hackathons even pair up first-time participants with more experienced ‘hackers’ to foster mentorship within the community. Additionally, you do not need to be a coder to participate — some hackathons bring in participants from different disciplines like graphic designers and business developers to provide a different perspective. The entry criterion to participate in a hackathon is only an interest in the topic and a desire to learn!

What will you get out of participating in a hackathon?

Hackathons are a great way to learn new practical skills, such as practicing teamwork, generating ideas, working under a deadline, sharpening one’s problem-solving abilities, and mastering the art of presenting. Additionally, participating in a hackathon is a great way to utilize your data science skills in a real-world setting and collaborate with your peers. For organizations, hackathons provide an opportunity to crowdsource applications. Examples of hackathons focusing on EO include the Earth Hacks that focuses on finding solutions to the climate crisis the world is facing, the Copernicus Hackathons that bring together developers and thematic experts to collaborate on new software, and the Dubrovnik Inspire Hackathon 2020 that focuses on improving the interoperability between methods for sharing in-situ and citizen-sourced data.

Mapathons

A portmanteau of “mapping” and “marathon,” these events are time-boxed, coordinated mapping efforts. Mapathons are a form of public participation GIS often used in digital humanitarianism efforts. Digital humanitarianism refers to mobilizing volunteers online to use big data and crowdsourcing in support of philanthropic and disaster response efforts around the world.

In the first stage of a mapathon, volunteers gather either in one place or virtually and use satellite imagery to remotely map areas with inadequate coverage on an online base map using platforms such as OpenStreetMap. Next, volunteers in the newly mapped areas go into the field to validate the remote mapping and to add additional local information. Once completed, these maps are an incredible resource for decision-makers who can use them to support things like disaster risk assessment and relief.

Figure 4: Steps in a Mapathon Workflow (credit).

Skills needed to participate in mapathons

Mapathons have the lowest participation barrier of any other data event. Mapping interfaces are typically user-friendly, and most mapathons commence with training. To participate, volunteers only need rudimentary computer skills, a desire to learn, and a drive to help people. No prior GIS or mapping skills are required!

What will you get out of participating in a mapathon?

Participants can use mapathons to develop spatial literacy skills and become familiar with GIS and remote sensing technology. Additionally, participants get to connect with other people who are interested in humanitarianism, GIS, and the mapping region. These events usually draw a diverse crowd and often spark compelling discussions about map generation. Humanitarian OpenStreetMap coordinates in-person mapathons and hosts online platforms that volunteers can use to contribute to ongoing mapping efforts on their own time. The USGS National Map Corps, YouthMappers and Missing Maps also offer online crowdsourcing mapping platforms.

Dataset Labeling and Validation

Labeled training datasets are the basis for machine learning algorithms. These datasets, in some cases, must be arranged into specific categories and checked for accuracy. For mapping applications, and those based on optical satellite imagery only people can, and should, complete these classification and validation processes. Mapping applications based on microwave and radar satellite sensors can be labeled without human validation.

There are two subsets of data labeling and validation events: Ongoing online projects, and in-person “labelathon” events. When labeling, participants examine the dataset and identify the features of interest, and during validation, participants assess the accuracy of previously labeled features. In-person events, like the 2019 Radiant Earth Validation Event, function similarly to mapathons in the sense that participants gather to work together in a set time and receive training on using the platform used to validate labeled categories. Ongoing labeling occurs online and is an easy way for participants to get involved with citizen science, either as a volunteer or to get paid for completing microtasks. There are pros and cons to setting labeling and validating events as a volunteer or paid gig.

Skills needed to participate in dataset labeling and validation

Like a mapathon, participants do not need prior experience in data science to participate in labeling or validating training datasets. An interest in the data domain or machine learning pipeline can help make the experience more meaningful as these events deal with the minutia of training data.

What will you get out of participating in labeling and validating data events?

Labeling and validating training datasets are a great way to gain insights into the fundamentals of machine learning. One also gains an appreciation for the precise and accurate training data needed to have sound output results. Examples of platforms that host EO-related data labeling and validation events are Zooniverse, Clickworker, and Amazon Mechanical Turk. Other platforms such as Sentinel-Hub Classification App provide customized solutions for labeling EO data, including water detection.

Data Competitions

In data challenges, participants compete to create the best solution to a problem outlined by the organization that provides the data. These competitions are opportunities for data scientists to tackle real-world issues, hone their data science skills, accrue feedback, and in many cases win prizes. Each challenge addresses a specific problem, provides all the required datasets, and specifies the methodology and evaluation criteria. Competition themes can span the whole field of data science, from data visualization, data analysis, to predictive modeling.

Skills needed to participate in data competitions

Since these competitions require participants to design and build models and/or apps, one needs technical expertise relevant to the prompt of each contest. That said, one does not need to be an expert to participate in data challenges.

What will you get out of participating in data challenges?

These events are an opportunity to use data science skills in a real-world setting, get feedback, and learn new skills and best practices. For organizations, it is a cost-effective way to test one’s application. Data challenges encourage participants to think creatively about problems and gain insight into the data science field, making it an excellent way to experiment and test one’s knowledge and keep up with this rapidly evolving field. Examples of current data challenges you can join include Spacenet’s 6 Challenge, Zindi’s Flood Prediction Challenge, and Kaggle’s Rainfall Prediction with Satellite Images Challenge.

References

--

--

Radiant Earth
Radiant Earth Insights

Increasing shared understanding of our world by expanding access to geospatial data and machine learning models.