Geospatial data science made easy

Granular Engineering
Granular Engineering
2 min readAug 22, 2018

By Granular Data Science

Granular is pleased to announce the release of pyspatial: an open source, BSD-licensed, Python library for simplifying geospatial data science. Here at Granular, geospatial data is an integral part of our software products. Be it our grower’s fields, sensor data from tractors and combines, weather, remote sensing, or soil information. The Granular Data Science Team uses this information to provide analyses and build models to help our growers make real world decisions on the farm.

TL;DR: Checkout the project on github and the examples.

MOTIVATION

The ecosystem for working with spatial data is rich, but it is somewhat fragmented. There are a various tools for querying, manipulating, analyzing, and visualizing spatial data, but the workflows are not well integrated. We often find ourselves needing to switch between various environments (command line, Python, QGIS, Javascript, and sometimes even *gasp* R) for basic data exploration/analysis. This involved not only figuring out the I/O nuances to connect these tools to together, but also managing the complexity associated with spatial projections.

Since we work with a lot of heterogeneous data sources, we need to explicitly keep track of spatial projections for each dataset. This had been a real challenge since many of the existing tools are projection agnostic.

Current libraries also do not provide adequate abstractions for collections. Often we associate a collection of objects with a logical grouping (e.g. all the fields for a single grower), and would like to perform operations on this grouping.

Finally, a primary use case for our workflows is to query rasters for pixels associated with vector data. While some of this functionality exists, it does not support tiled datasets, nor does it link the data structures used for working with vector data to the queries for used for raster data.

LIBRARY HIGHLIGHTS

  • Battle tested: we use it for our day-to-day work, and for processing all the data behind AcreValue. In fact, all of our PostGIS workflows have been migrated to pyspatial.
  • Read/write both raster and vector data (including support for http/s3 sources). Also convert to/from shapely/gdal/ogr/numpy objects seamlessly.
  • Fast spatial queries since it leverages GDAL and libspatialindex/RTree. For extracting vector data from a raster, the library is 60x — 100x faster than R.
  • Integration of vector/raster data structures to make interoperation seamless.
  • Pandas like API for working with collections of geometries.
  • First class support for spatial projections. The data structures are spatial projection aware, and allow you to easily transform between projections.
  • When performing operations between data sources, the data will automatically be reprojected intelligently. No more spatial projection management!
  • Integrated interactive visualization within IPython (via Leaflet). Plots markers, geometries, and choropleths too!

--

--

Granular Engineering
Granular Engineering

A place where Granular software developers talk about software. Granular is changing the future of farming by helping farms become more valuable businesses.