Overcoming Sisyphus: Automating Deployment of Geospatial Python Environments in Windows

Alec Brazeau
Resilience Solutions Tech Blog
4 min readOct 30, 2019

--

For all of the tools utilized in this post, see geo-env on the Dewberry GitHub page.

In Greek Mythology, Sisyphus was condemned by the gods to push a boulder up a mountain, only to have it roll back down once he reached the top. For those who have set up a Python environment for geospatial processing on Windows, the mundanity and repetition of Sisyphus’s task may sound all too familiar: every package update breaks the development environment. Just when you’ve pushed your project to the top of the hill — back down again you must go to reinstall and retest programs to make sure your environment still functions. Unfortunately for Sisyphus, no quick fixes could free him from his eternal task. However, there is an elegant tool that can be used to turn Sisyphus’s mountain into a walk in the park.

Outside of a GIS platform like ArcGIS or QGIS, geospatial processing and analytics can be done using Python. Geospatial development in Python usually requires some combination of five main modules: geopandas, shapely, fiona, rasterio, or gdal.

While these modules aren’t always in use and are often paired with others for specific programs, it is essential to have them in working order and in a single environment for the sake of development.

Setting up these modules is a cinch in a Linux environment, you can pip install each one of them. In a Windows environment, setting up these modules is a nightmare. This is where package and environment management systems come in to play. Since my team uses Jupyter Notebooks quite a bit for both development and deliverables, we defer to Anaconda or Miniconda to manage Python packages on our local machines (for development) as well as on AWS EC2 Instances and Google VMs (for production).

Pluvial/Fluvial probabilistic flood risk for the Washington, DC area, created using Dask, rasterio, gdal, and h5py.

Installing and configuring these five modules on Windows is surprisingly difficult, and I often find myself troubleshooting the installation process at colleagues’ desks. I have become all too familiar with fiona throwing an ImportError: DLL load failed: The specified module could not be found. when a deadline is fast approaching.

My usual fix is to:

  1. Fully uninstall gdal, shapely, fiona, rasterio, pyproj, and rtree (pyproj and rtree are underlying dependencies).
  2. Head over to Christoph Gohlke’s Unofficial Windows Binaries for Python Extension Packages site and download the wheel files for gdal, shapely, fiona, rasterio, pyproj, and rtree.
  3. Install the wheels with pip: pip install <WHL PATH>
  4. Now that its dependencies have been successfully installed, pip install geopandas.
  5. Verify all modules were correctly installed by opening the Python interpreter and importing each one.

However, this was getting repetitive.

Like most programmers, I get bored when doing repetitive tasks. The fewer the keystrokes and mouse clicks, the better! To automate the installation of these wheel files, I developed a command line tool that does it all for you. The script, install_wheels.py, will detect your system architecture and python version, search Christoph Gohlke’s website for a matching wheel file, then install it with pip without downloading it. To use it, you need bs4, requests, and lxml installed, which are simple to install using pip.

Using the tool is simple, just call it from the command line with Python, give it the package name as the first argument, and (optionally) the package version as the second: python install_wheels.py gdal 2.4.1.

Production

While this script removes some hassle from getting these modules set up and ready to use, it does not do the entire job from start to finish. Getting a working, reproducible environment set up across our team as well as deploying these modules in programs and workflows on the cloud is where Miniconda really shines.

Miniconda is a bare bones, lightweight install of the Anaconda platform. It is easy to download and install from the Windows command line, which lends to easy deployment through a windows batch script. To make this deployment automated and reproducible, I put together a batch script that will download and install Miniconda, create a new conda environment, install the fickle geospatial packages, then install all other relevant python packages that are needed. This script can easily be adjusted by adding or removing modules as needed.

The end of this batch script calls another short python script to test importing all of the modules in the new environment. It will output a text file (import_log.txt) that will report the results of attempting to import every module available in output of the pip freeze command. This log can be used to verify that each and every module was successfully installed and can be imported.

Setting up a geospatial Python environment on a Windows machine is no walk in the park. Geopandas, shapely, fiona, gdal and rasterio are typically the modules that give me the most installation issues. Please feel free to share your experience when trying to get these up and running. Any recommendations regarding the process are also welcome. If you would like to contribute to building out a simplified geospatial Python environment deployment process, pull requests and issues are always encouraged.

--

--

Alec Brazeau
Resilience Solutions Tech Blog

Geospatial Programmer with Dewberry’s Resilience Soultions Tech Team.