Open source + open data helps Capstone project visualize Puget Sound restoration
Lack of interoperability, dispersed data, and inconsistent data formats are common issues across information science. Curators of public data often are faced with the additional demand to reconcile open-data needs with proprietary data formats.
In the Open Data Literacy (ODL) Capstone project “Opening Up the Data: Visualizing the effectiveness of Puget Sound restoration efforts,” team members tackled these common issues in the realm of environmental restoration. Numerous environmental restoration projects have been undertaken throughout the Puget Sound region of Washington state, but connecting investments in these restoration projects to co-located indicators of habitat viability is challenging. Additionally, existing analyses use proprietary software such as ArcGIS, with its open-but-proprietary ESRI spatial data format.
To address these matters, team members Katrina Gertz, Tim Blankemeyer, and Emma Clarke — all MLIS students at the University of Washington’s Information School — collaborated with the Puget Sound Partnership, the Governor’s Salmon Recovery Office, South Sound Spatial, and other project partners. The team leveraged open, found data and open-source tools to build a scalable, sustainable data processing pipeline and an interactive, web-based visualization prototype.
Given the project timetable, scope for the prototype was limited to the Hood Canal region of the Puget Sound and to selected data sets. Underlying data for the project fell into three categories — inputs, outcomes, and geospatial:
- Inputs: Environmental restoration project data were sourced from the Washington State Department of Ecology and the Washington State Recreation and Conservation Office.
- Outcomes: Indicators of habitat viability — summer chum salmon counts and water quality measures turbidity and total suspended solids (TSS) — were sourced from the Washington State Department of Fish and Wildlife and the Washington State Department of Ecology.
- Geospatial: Hydrologic Unit Code (HUC) boundaries (denoting watersheds and subwatersheds) were sourced from the U.S. Geological Survey’s Watershed Boundary Dataset (WBD).
The team used the statistical programming language R, interactive mapping library Leaflet, and interactive web application framework Shiny to implement a meta-analytical measure of effect size called Cohen’s D and build a prototype that visualized connections between restoration investments and habitat viability indicators, aggregated at two subwatershed levels. This interactive visualization prototype can help our project partners better tell and understand the story of Puget Sound restoration efforts.
This Capstone project illustrated many of the challenges around data, particularly open data. The team found that environmental restoration project data, though public, were not truly open, as access to that data required a log-in. Water quality data were sparser than expected and messy, suffering from a lack of data governance around column consistency and controlled vocabulary. And summer chum count site location data were not available, requiring manual site placement with guidance from domain experts (an effort is underway by project partners to address this shortcoming).
Beyond those issues, however, the project also produced a key outcome: open-source tools can help data curators meet open-data needs.
See the prototype: https://ejclarke.shinyapps.io/capstone/
- CAVEAT: The project prototype shows what a web-based analysis tool might look like. The underlying data for water quality and salmon were sourced from public web sites. Data and results have not been vetted or approved; that is the next step.
See the project GitHub: https://github.com/katger4/psp-tek
See the iSchool Capstone site: https://ischool.uw.edu/capstone/projects/2017/opening-data-visualizing-effectiveness-puget-sound-restoration-efforts