Open Reproducible Science: Workflow Structure

Learn to create a rigorous workflow structure with Python

Cale Kochenour
Mar 29 · 4 min read
Looking across a field toward mountains in Colorado, United States
Credit: Image by Galyna_Andrushko via Envato Elements

Overview

One key to open reproducible science is to provide rigorous organization of all workflow code. Not just for when you send your project to someone else. A future version of yourself will also benefit when you return to an organized workflow after some time away.

  • User-Defined Variables
  • Data Acquisition
  • Data Preprocessing
  • Data Processing
  • Data Postprocessing
  • Data Visualization
  • Data Export

Workflow Stages

Below I have provided my definitions of each workflow stage. I based the definitions my experience with geospatial remote sensing projects. Note that the preprocessing, processing, and postprocessing stages may contain overlap. This depends on the nature of the data, the specific software platform, and/or personal preference. Some may consider calculation of index layers like the Normalized Difference Vegetation Index (NDVI) in remote sensing to be either a preprocessing or processing step.

  • Renaming elements of the data
  • Converting units
  • Masking data
  • Replacing Not a Number (NaN) or NoData values
  • Creating satellite image index layers
  • Applying kernels/filters to satellite imagery
  • Change detection
  • Spatial analyses
  • Applying kernels/filters
  • Assigning attributes

Workflow Documentation

Documenting a script provides assistance to anyone who uses your code (including yourself). The snippet below outlines the docstring format I use at the beginning of scripts.

  • Data references or links
  • Required data file inputs
  • User-defined variables that need to bet set
  • Output data files created
  • Other information or settings relevant to the script

Workflow Philosophies

There is no definitive way to structure your code. It depends on the nature of the workflow as well as personal preference. Organizational or group guidelines can also play a factor, if you are creating workflows as part of a team.

Wrap Up

Hopefully this will help you to create organized workflows. Here are the additional resources I created to complement this article:

Geospatial Talent Stack

A Medium publication dedicated to building your geospatial talent stack.

Geospatial Talent Stack

This publication provides articles dedicated to building your geospatial talent stack. Topics include Python programming, open reproducible science, remote sensing, geographic information systems, earth data science, and more. Articles span all stages of the scientific workflow.

Cale Kochenour

Written by

Scientific programmer. Interested in all things remote sensing.

Geospatial Talent Stack

This publication provides articles dedicated to building your geospatial talent stack. Topics include Python programming, open reproducible science, remote sensing, geographic information systems, earth data science, and more. Articles span all stages of the scientific workflow.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store