Exploring the design of a focused geospatial library
I have been developing an introduction to geospatial data structures and algorithms. The main roadblock I was facing was building a minimalist library to assist with the introduction. I wanted the introduction to be hyper-focused on learning the principles of geospatial computation, foregoing any complexities of data formats and map projections. Though important, they are not necessary to learn the principles of data structures or algorithms.
I also failed to find any course built solely for educational purposes. Tutorials cover using specific and awesome tools… for production. I needed a minimalist library designed for learning the core tenants of geospatial. I thought I would need to build it myself. I attempted the first implementation of the library in Python and the second one in Golang. The task turned out to be too time-consuming to do with my other projects.
I was reintroduced to Shapely at my work. I had used Shapely a couple of years back, before my computer science degree. After reading its description, I was pretty excited to have a library that seemed to match my exact needs.
- Python programmers should be able to perform PostGIS type geometry operations outside of an RDBMS
- Persistence, serialization, and map projections are not the core of geospatial data structures and algorithms
- Python, as a language, can be a tool of choice for geospatial data science
Before committing to Shapely, I wanted to dig a little deeper into its design choices. User docs are great for getting up and running, but they rarely give an architecture overview. Shapely’s documentation is no exception. I walked through how Shapely calculates the euclidean distance between two points. I went down the rabbit hole seeking to understand the structure of the library.
Quick Tip: Most code editors will let you jump around to the definition of methods with shortcuts. While hovering over a method in Visual Studio Code, you may right-click and go-to definition (F12).
Calculating the distance between two points required less than 6 lines of code:
Using the aforementioned quick tip, I went to the definition of the Point to check out the distance algorithm. The Point class does not define a distance algorithm. I kept digging deeper and deeper to finally understand where the distance algorithm is actually defined. Here’s a look at the Point class, notice the absence of any methods beyond an initializer or getters and setters:
The short answer: Shapely does not define any spatial algorithms. It relies on GEOS bindings.
Before going into too many details at length, it can be helpful to have an overview of the class diagrams. I am visual and need the diagram to make any sense of the following explanations. Numbers match the different diagram sections with their explanation. Interestingly enough, Shapely never actually defines any of the spatial algorithms it uses.
- The Point class does not define a distance method directly
- The Point class is derived from the BaseGeometry class which has a distance method defined
- BaseGeometry uses its implementation variable (impl) to defer defining the specific algorithm that must be used to calculate the distance
- The implementation variable is an initiated variable of the class GEOSImpl that takes into account the GEOS version of the current environment
- The implementation variable inherits the values map from the BaseImpl class, the map is composed of tuples of the format (Predicate, name)
- Predicates define their function call with the __call__ method and in this call method, they reference the fn data member of Delegating, their abstract class
- Finally, the fn data member is actually a function that calls lgeos.methods[name] which actually calls the GEOS DLL function
A few side notes on the diagram:
- Abstract classes have their class names in italic
- Concrete classes have a normal font
- Full arrowheads and lines represent inheritance
- Full arrowheads and dashed lines represent function calls
- Empty arrowheads and full lines represent a class composition
- The enumeration of class methods and variables is non-exhaustive
- Some details will be easier to understand by actually diving into the code
- Understanding factory methods helps grasps some of the seemingly complex design
The Big Picture
What you should remember from the diagrams is that Shapely is designed to serve as a python binding to the geos library. It does not define anything more than the levels of abstraction to effectively call the geos functions.
If you would be interested in following a curated and logically sequenced course to learn the principles behind those operational geospatial tools you use every day, reach out on LinkedIn or Twitter (@ geosimple1)