Catalyzing Data Maturity to Understand Climate Change Effects on Fisheries in México

Centro para la Biodiversidad Marina y la Conservación, AC’s (CBMC) Accelerator grant partnership, sponsored by the International Community Foundation, supports the development of spatially explicit models that will serve as critical inputs to an open-source platform designed to monitor extreme environmental events and advise the fishing sector on appropriate adaptive resource management strategies.

by Fabio Favoretto and Marisol Plascencia de la Cruz

The Centro para la Biodiversidad Marina y la Conservación (CBMC) was founded in 2012 to support Mexican researchers in developing ambitious research projects and allow young scientists to continue their scientific training. We work to generate scientific knowledge and facilitate its integration into the conservation and resource management processes. We believe that knowledge should contribute to protecting vital marine and coastal ecosystems and the species that inhabit them. Our research covers topics relevant to Mexican society, such as fisheries management, protection of biodiversity, as well as the contributions that the use of natural capital generates to social wellbeing. Our work is based on innovative multidisciplinary methodologies that incorporate technology and traditional methods. In addition, we promote collaborative science to facilitate the participation of resource users, citizens, and other interested people. This philosophy has allowed us to carry out research comprehensively and efficiently.

Our dataLab is dedicated to processing all the scientific information generated by our team through close collaborations with fishing communities, NGOs, and government agencies; collectively analyzed to help improve our knowledge and understanding of ecological and social processes that relate to the use, protection, and management of natural resources in Mexico to understand how climate change will influence catches, behavior, and spatial use of fisheries. Our dataLab is improving the data infrastructure within the CBMC, making our data pipeline more efficient, and stepping up analytical and predictive capabilities with machine learning and neural network modeling to achieve these goals. The greatest challenge is not necessarily the advanced analysis at the end of the data pipeline but all the data preparation and the infrastructure setup. While this might sound counterintuitive, the computing limitations of personal computers make it necessary to have a cloud infrastructure that administers the storage and analytical power to handle all the data pipeline.

Organizations like ours often don’t necessarily have trained in-house personnel dedicated to cloud infrastructure; thus, our dataLab team had a lot to catch up on. Navigating the several options modern cloud infrastructure has available to store and make your data analysis-ready can be daunting, but it is game-changing for any organization. Whether SQL databases or unstructured serverless databases, it is a trial-and-error endeavor at the end of the day, which can be frustrating for any team. Still, it is worth the investment in the long term for the faster data access and severe reduction of all the tedious data wrangling in favor of what matters the most — the final analysis. Therefore, any time and economic investment in building a better infrastructure that can improve data maturity are well worth it, even for a small-scale organization that aims to generate more impact.

Process
To understand how climate change might affect fisheries, first, we need to know where and when vessels are fishing. Using the Mexican government’s Vessel Monitoring System (VMS), we can model vessel tracks to obtain fishing areas associated with environmental conditions. However, the data pipeline to analyze and wrangle these data was time-consuming. Therefore, our first task was to create a tool to facilitate access, wrangling, and modeling. This tool is under development but available to everyone under an open-source package written in the R programming language. Then we started to construct the datasets to merge with vessel tracks, i.e., the vessels’ metadata containing the species and quantity each vessel declared to have caught when they reached port, their fishing permits, and fishing gear. This data-wrangling creates a database of semantic trajectories enriched with information to enable results interpretation.

All data are then stored in an Amazon Web Services S3 bucket linked to an EC2 instance running on the Cloudera Data Platform. The latter allows for easy management of Machine Learning workspaces deployment and customization, among other valuable tools for data science. The S3 bucket was also linked with the Athena service, which automatically creates a SQL table from the data stored in the bucket and allows it to be queried flawlessly. Within the same S3 bucket, we are also storing several raster images that describe the environmental conditions (e.g., temperature, chlorophyll concentration, salinity, etc.) of the fishing areas historically and in the future (available climate change models). Upon all these data, we are building a predictive model that encroaches on fishing areas with environmental conditions to understand how they will be impacted by climate change, specifically which vessel categories and fishery targets (i.e., fish species). Once our models will be optimized, we will use an interactive visualization platform to show the results to the fishing industry, obtain feedback, and work together to improve large-scale fishery management.

All this practical work was first conceptualized by the development of a “tactical roadmap”, which was created with the help of experts from the Patrick J. McGovern Foundation (PJMF). During our weekly technical syncs, we were introduced to the basics of the cloud infrastructure, and we received advice on how to structure our data pipeline. Then we worked to create a document containing all the information needed to manage the infrastructure, and understand the overall conceptualization, goals, and timeline. To convey all this complex information, we were particularly inspired by the C4 model for software architecture, which we adapted to our project and became a powerful communication tool within our own team.

Conclusion
Our dataLab’s team of marine ecologists who, for some reason, decided to learn coding instead of diving all the time, started this Accelerator grant partnership with a shared Dropbox folder, and a shared laptop (the only one with enough RAM), and a shared goal. Thanks to the support of PJMF and the insights of their always-available technical team, we are now running a state-of-the-art data science infrastructure. What progress! Our team currently manages servers, cloud computing instances, and serverless databases. Our data analysis pipeline is optimized, and we are now applying what we learned to all other projects in our organization. Within the Accelerator program, we are tackling a problem of worldwide importance, which is how fisheries will react to climate change. We are implementing semantic trajectory modeling to satellite geolocations of industrial fishing vessels that define critical fishing grounds for the fleet. We then model fishing data with ocean variables, and we predict future fishing efforts and catches of the fishing grounds according to climate change models. In the future, marine productivity will change as the temperature rises. The fishing industry and millions of people living in coastal areas will need to adapt to maintain their livelihoods and sustainably extract marine resources. Our model outputs will better inform the fishing industry about future climatic scenarios.

--

--

The Patrick J. McGovern Foundation
Patrick J. McGovern Foundation

Inviting conversations on how AI and data solutions create a thriving, equitable, and sustainable future for all.