Technical issues in most of the popular GIS Python packages

Mostafa Farrag
Hydroinformatics
Published in
5 min readMay 4, 2023

--

Most of the popular GIS packages in Python have a problem of accumulating dependencies that have the same functionality over each other for example xarray uses netCDF for NetCDF files and rasterio (which uses GDAL) for GeoTIFF files, however, GDAL can handle both NetCDF and GeoTIFF. In this article, we will go through some technical issues in some of the popular GIS Python packages and propose an alternative package.

Popular GIS Python packages and data formats

Article Outline

In this article, we will go through the following

  • The Technical issues in most of the current GIS packages (API, dependency, conversion between different data formats).
  • GDAL(raster drivers, vector drivers, Then what is wrong with GDAL? why it is not used widely as these packages?).
  • The solution of the Pyramids package to the API, dependency, and conversion problem.

Issues with current GIS packages

API issue

Most of the GIS Python packages that handle rasters (xarray, rasterio, rasterframes, …..) is the dependencies and API.
Unlike geopandas which is the most popular package for handling vector data, the API of these packages is not as straightforward to access the values, reprojecting and doing any spatial operation on the rasters (merging, cropping, overlaying, ….).

Dependency and supported data types Issues

Another issue is the supported data formats by these packages mainly xarray handles NetCDF data (so it has the package NetCDF as a dependency) in addition to rasterio as a dependency to be able to handle GeoTIFF files, so there is no package that solely handles most of the common data formats without the need to include other packages as a dependency.

Conversion between Data formats

The conversion from one data format to another (from GeoTIFF to NetCDF for example) if exists, requires a lot of coding and there is no way you can do it without having NetCDF as a dependency beside another package that handles GeoTIFF.

GDAL

GDAL Package

All of these packages have one thing in common GDAL, which is actually the answer to all of the previous issues, GDAL handles almost all rasters and vectors data formats (NetCDF, GeoTIFF, ASCII, …., 160 data formats 50 of which you can create them with GDAL and the rest you can read only).

Available raster data format drivers in GDAL (image from GDAL documentation)

For vectors, GDAL has a submodel called OGR which is the main package that even GeoPandas, Fiona and all popular packages that handle vector data have as a dependency.

Available vector data format drivers in GDAL (image from GDAL documentation)

Then what is wrong with GDAL? why it is not used widely as these packages?

GDAL is written in multiple languages C, C++, and Python, and most of its functionality is written in the form of a command line interface, you have to use it in a terminal or you have to form the terminal command as a string and execute it using subprocess in a python script, which is not easy for most of python users who want to crop a raster with a polygon on the fly. Even if they decided to do it GDAL documentation is one of the hardest ever to navigate as the documentation of the C, C++, and Python APIs all are in the same place and you can spend hours just to know how to use a function.

Pyramids GIS package

Pyramids is a GIS package that handles both raster and vector data types in most of the data types supported by GDAL, the main reason Pyramids is created is to solve the dependency problem in most of the current gis python packages. Besides the dependency, Pyramids adopts the same API as GeoPandas to make use of the popularity and ease of use of GeoPandas API. The backend of Pyramids is built solely on GDAL which makes adding new data drivers, implementing new functionality, and maintaining the package very flexible.

overall pyramids (raster and vector) data model

API issue solution

The main point of Pyramids is to make use that GeoPandas is widely used almost by everyone who works with GIS and to use its API but for rasters, so the same object methods you use to reproject a GeoDataFrame (to_crs) you can use it re reproject a raster.

The API for the Dataset class in the Pyramids package.

Dependency issue solution

Besides the easy API, Pyramids depends only on GDAL and GeoPandas so it can handle all formats that are handled by GDAL but without the need to have NetCDF as a dependency to handle NetCDF files.

The main dependency of the pyramids package is as follows:

channels:
- conda-forge
dependencies:
- python >=3.9,<3.11
- numpy >=1.24.1
- hpc >=0.1.1
- pip >=22.3.1
- gdal >=3.6.2
- pandas >=2.0.0
- geopandas >=0.12.2
- Shapely >=1.8.4,<2
- pyproj >=3.4.0
- PyYAML >=6.0
- loguru >=0.6.0

Conversion between data types/formats solution

The conversion from rasters to vectors is as easy as

dataset.to_polygon()
# another method is
dataset.to_geodataframe()

and the opposite from a vector to a raster

featurecollection.to_dataset()

Pyramids Overview (Dataset class)

The figure below shows the data model of the Dataset object which represents the raster data type.

Raster data model as represented by the Dataset object in the Pyramids package
  • To check the examples for the main attributes and methods in the Dataset class in the pyramids package check out the following jupyter notebooks

and for the spatial operation method

In the next article, I will go through all the functionality provided by the dataset module.

--

--