Three Reasons to use Jupyter Notebooks as a GIS User
Jupyter Notebook is a powerful tool that allows Python users to create and share documents containing live code, visualizations, explanatory text, and equations.
The term “notebook” is very applicable, since the tool allows you to write snippets of self-contained executable code (named “cells”), note each procedure, and even visualize data you are working with at any step of the way.
Why should I use a Jupyter Notebook?
Jupyter Notebooks have gained tremendous popularity in the Python data science community over the past years for a variety of reasons. As a GIS user, I have personally found Jupyter Notebooks to be extremely useful for the following three reasons:
1. Prototyping of Python Workflows
Jupyter Notebooks are extremely useful when you do not have a defined final process and are still in the prototyping phase of your scripted workflow. This is mainly thanks to the feature where code is written into independent cells, which can each execute independently from the rest of the code. This allows a Python user to quickly test a specific step in a sequential workflow without re-executing code from the beginning of the script.
Many Integrated Development Environments (IDEs) allow you to do this in several ways, but I’ve found Jupyter Notebook’s concept of a “code cell” to be the most intuitive approach for prototyping logic and sequential code.
2. Visualizing Pandas Dataframes
Pandas (Python Data Analysis Library) provides high-performing and easy-to-use data structures that allow you to work with large amounts of data extremely fast. The core data object is a Dataframe, which is essentially an in-memory table that allows powerful indexing operations.
Jupyter Notebook allows you to visualize these tables at any point in your notebook. This is extremely useful because you can view the state of your data (and the effect of all the actions your code is performing on your data) as each step of your logic executes. This capability reinforces the use of Jupyter Notebook in a prototyping workflow when you are attempting to confirm that your workflow is doing what it needs to do at each step of the way.
So why are Pandas Dataframes such a big deal?
As a GIS user, the first foray into working with Python and GIS data management typically uses some mix of arcpy’s “CalculateField”, “SearchCursors”, and “UpdateCursors”. Most of the examples teach you to use these operations and they are all completely functional, but they suffer from the same process-intensive issue: they all need to iterate upon every record of your data to perform a data management operation.
In other words: Imagine that you are a director of a movie in production, and you find out that to change the lighting in a scene, you need to watch the movie from the very beginning… for every change. This would take forever!
Operating on a Pandas Dataframe solves for this with powerful indexing that allows effective querying and array-wide operations. You essentially find the specific scene of the movie that you need to fix, and skip to that scene. Once my GIS data analysis workflows started integrating Pandas Dataframes into heavy data operations, I saw exponential improvements in performance.
Visualizing these Dataframes and seeing the effects of my code in each dataset became a crucial component of working efficiently.
3. Integration with ArcGIS
The newest (and most exciting) reason is the integration of Jupyter Notebooks with the ArcGIS Platform. My two main production tools had long been the ArcGIS Platform and Jupyter Notebook. When Esri announced that the ArcGIS API for Python would provide support for geographic visualizations, organization administration, and even access to the most powerful analytical capabilities of the platform within Jupyter Notebooks, I literally could not stop smiling.
The new ArcGIS API for Python renders each Jupyter Notebook an extension of your distributed GIS. Among several other capabilities, you can:
- Set up a notebook that will connect to your Portal and provide you detailed reports on each user’s content, groups, and statistics, and perform backups of all the content in a Portal based on user group. Free yourself from administration tasks to explore and analyze.
- Create integrated maps and data operations that are connected to code cells in your notebook. All the prototyping benefits mentioned above are now part of your spatial analysis workflow.
- Leverage GeoAnalytics tools and other geoprocessing operations on data workflows that you are already working with in your Jupyter Notebook. The most powerful new tools are already incorporated into the API.
Even with all these benefits, coming up to speed with Jupyter Notebooks as a GIS user can be a daunting task. Stay tuned for a few tips on how to navigate and operate Jupyter Notebooks…