gsfpy — The story of a pip package

Luke Marsden
UK Hydrographic Office
2 min readJul 6, 2020
Visualization of a bathymetric survey. Noise is shown in red, valid soundings in blue.

Back at the beginning of 2020, the UKHO Data Engineering team embarked on a project to create a tool for cleansing bathymetric survey data. The tool is built around a Deep Learning model previously developed by the Data Science team and described in this academic paper.

Bathymetry is the study of depth in water bodies and can be an esoteric field of work. The sonar-derived data come in a multitude of different formats, many of which are proprietary and some sparsely documented. Development of the tool required us to devise a way to pass this data through the model in large quantities.

One attempt to overcome the problem of heterogeneous data formats has been the creation of the Generic Sensor Format (GSF), an initiative started by the U.S. Navy in the 1990s and alive to this day. It is a free and open format, is supported by a wide range of tools, and the source code is made available in C. We arrived at the conclusion that, if we were to support just one bathymetric file format, this would be the one.

Our default choice of programming language in the team is Python, primarily due to the excellent data and machine learning ecosystems it provides, so we began to search for Python libraries that would allow us to work with GSF. To our surprise, there was nothing out there in the public domain. So we set to work creating a Python package that would wrap the GSF C library*, and which would act as a foundation for our cleansing tool. As an added bonus, it would be something we could contribute back to the community to fill the gap we had found.

Fast-forward a few months, and that package is now open source and available to the wider world both as source code on UKHO’s GitHub page and as a pip package on PyPI.

The UKHO has open-sourced code before, but this is the first time we have published a package. We have been pleased with the uptake - it has been out in the wild for just over two weeks at the time of writing and is reporting several hundred downloads already. If you work with GSF data, give it a go and let us know what you think. It’s as simple as pip install gsfpy!

*Python’s inbuilt ctypes module was ideal for the purpose of wrapping the GSF C library. It provided the low-level features required to reflect the complex hierarchies of structures and pointers that exist in GSF.

--

--