PostGIS tips: Where to get started
The City of Boston’s Analytics Team recently implemented a new data warehouse platform built with Civis Analytics. We pushed for Postgres as the backend of the warehouse so that we could leverage its PostGIS extension in our data pipelines and analysis.
In an effort to help my colleagues get acquainted with PostGIS, I put together the introduction materials below. The materials cover what PostGIS is, when/why to use it in our work, and examples of various functionality.
As Postgres and PostGIS are open source, and we work in the public domain, we wanted to share this resource for others to leverage as well. Although the content below is largely Boston-focused, the lessons should be extendable for other places and uses.
Because we cover a lot, we’ve split the content up into three parts. This first section gives a high level introduction to PostGIS — covering when/why to use it, how to use it with the City’s existing toolset, and a general introduction into projection systems.
Part 2 gets into more detail around storing geospatial data and dealing with projection systems in PostGIS.
Part 3 covers a more advanced functionality with an overview of spatial indexing and various examples of spatial joins, buffering, finding the nearest, and clustering.
What is PostGIS
PostGIS is an open source extension for PostgreSQL. It brings three things to PostgreSQL that are crucial to working with spatial data:
- Support for spatial data types (points, lines, polygons, rasters)
- Spatial functions that allow you to ask questions like “what City Council District is this public bathroom in?” of your data in a SQL query
- Spatial indexing so your spatial queries run quickly
Why PostGIS and when to use it
For the City, the main thing PostGIS gets us is the ability to automate spatial queries in our data pipelines.
If a department asks for some ad hoc GIS analysis, ArcGIS Desktop (ArcGIS Pro or ArcMap) will likely be the best tool for you to use. You’ll be able to easily analyze the data and share the results with the department via ArcGIS Online. In addition, if you’re helping someone with spatial data collection in the field, an ArcGIS product like Survey123 or Collector would probably be best.
On the other hand, if a department needs a report to be updated with some frequency (daily, weekly, monthly, etc.) that has a spatial component to it (e.g. a table showing the nearest bathroom to each music venue), PostGIS might be the tool to reach for.
Example PostGIS pipelines
We get new data every month for the Vision Zero Crash Map. Because 911 data gets geocoded to addresses, mapping the raw data means it looks like crashes are happening on/in buildings. Ideally they would be mapped on the street they occurred on.
We use PostGIS functions to move the crash points to the closest location on the street which they occurred. We then use the ArcGIS Python API to overwrite feature services hosted on ArcGIS Online. We display those services on the web map using Mapbox, a web mapping library.
311 Cases near City property
When a 311 case is opened on a city-owned parcel, the responsibility to address the issue may fall on the department responsible for managing the parcel, not other departments who usually deal with that kind of case. As a result, we were interested in seeing all the 311 cases opened within 30 feet of any City-owned parcel. We wanted a table showing all of these cases and generic information about each (time opened, description, etc.).
Using PostGIS, we buffered the City-owned parcels by 30 feet, asked which 311 cases fell inside those areas, and created a script that would re-do that calculation twice a day. We then built a Tableau dashboard displaying just a table (without a map), with these cases listed.
This section could also be called “the worst part of GIS” or “all maps are lies”. Unfortunately we have to think about projection systems because we live on an odd shaped rock that we pretend is a nice sphere, that we then try to flatten into a rectangle when we map it. All the pretending leads to inaccuracies that we account for using projection systems.
Different projection systems are good for different things. Us and pretty much everyone else uses the Web Auxiliary Mercator Projection (EPSG:3857) for web mapping. The long/lat coordinates we generally think of when we think “map coordinates” (e.g. -71.057961, 42.360343 for Boston City Hall) are in the World Geodetic System of 1984 (WGS84, EPSG:4326). These projection systems are great because we recognize them and they make your data look like everyone else’s.
The bad thing about these projections is that they distort information in ways that make them untrustworthy for things like measurements. To get around that, we use MA State Plane Feet (EPSG:2249) for most of our data (e.g. 911 data, address data, etc.). This coordinate system is much more accurate for Boston, MA so is really important to use when you’re doing things like buffering, measure distance between things, routing, etc.
- EPSG 4326 vs EPSG 3857 (projections, datums, coordinate systems, and more)
- Choosing the right projection system
- long, lat vs lat, long inconsistency between systems
- Geographic vs Projected coordinate systems
- The Mercator projection is a lie! (article explaining thetruesize.com)
- THE scene from The West Wing
- This tool that shows you the difference between a bunch of global projections
Making sure your queries work
When running spatial queries, we validate them by looking at the result on a map to confirm the spatial relationships are being calculated correctly. To do that, you’ll use a GIS desktop software — either QGIS or ArcGIS Desktop — to visually check the results are correct (e.g. “Clearway St. really does intersect city council district 7”).
Connecting Postgres to QGIS or ArcGIS
ArcGIS Desktop and QGIS are very similar tools — they are both Desktop applications that allow you to connect to, view, and analyze spatial data. The main difference is that QGIS is open source while ArcMap/ArcGIS Pro are proprietary tools. The good thing is we at the City have access to both, so you can use whichever one you are most comfortable with.
To connect to our Postgres database in QGIS, navigate to the PostGIS section in the browser pane. Right click on PostGIS and select “New Connection”.
In ArcGIS, to connect to a database, click on “Add Database Connection” under “Database Connections” in the ArcCatalog pane.
In both systems, you’ll have to enter a bunch of information in the dialog box that pops up about your database and connection credentials to it.
Using the Database Manager in QGIS
QGIS has a Database Manager that allows you to write and run SQL queries just like any other SQL editor. This tool’s beauty lies in the “Load as a Layer” button. It allows you to load the output of a select statement to the map and immediately check its accuracy.
Clicking on “Database” in the top toolbar of QGIS should bring up a dropdown with the “DB Manager in it”. Clicking that should bring up a new window where you can see database connections you have. You can open up the SQL window by clicking on the small “SQL Window” icon.
From there, you can start writing SQL queries in the window just as you would in the Postgres query pane and run them. After clicking “Execute”, make sure the “Load as new layer” and “Geometry Column” are checked, select your correct geometry column, and click “Load”.
After clicking “Load”, a “Query Layer” will be added to your map showing the results of your query. This means you can verify the spatial relationships immediately. For example, in the query below, we can see we successfully selected only the bathrooms in city council district 7.
Importing geodata into Postgres
You can use the QGIS using the Database Manager outlined above. Clicking on “Import Layer/File” will open a window prompting you to select the file and a few options related to it.
To import the file:
- Select the schema you’re sending it to from the drop down (you must have write permissions to that schema)
- Input what you want the table name to be
- Select the “Geometry column” option and set the name you want it to be. For us at the City, we name geometry columns geom + the SRID for the coordinate system the data is in:
- Make sure the “Source SRID” and “Target SRID” fields are accurate.
- Select “Convert field names to lowercase”. This is only to comply with our governance policy at the City that all field names use lowercase.
- Select “Create spatial index”. See Spatial Indexing section for more reference.
- After clicking “Okay”, you may have to refresh the schema for the new table to show up.
Sharing your work
If your end goal is to make a map you can share with other people, you’ll likely want to export the data you’ve manipulated to ArcGIS Online. If you’re doing a one-time analysis, you can export the dataset and manually upload it to ArcGIS Online.
If you’re working on a pipeline or creating a dataset that will periodically get updated, you can use the ArcGIS Python API to interact with our ArcGIS Online instance in python. The API allows you to do things like overwrite a hosted feature service with new data or append new records to an existing dataset. From there, you can add that feature service to a web map/app and share it.
Future state: Maps in Tableau
Tableau has announced plans to support geometry columns in their maps in future versions. This means you could connect Tableau to our Postgres database and create a map directly from any table that has a geometry column. That map would update as often as Tableau refreshed and your data updated.
If you don’t need a map, like in the 311 cases, you can still create tables/charts with any spatial data in Tableau.
Thanks for reading! Part 2 has examples of how to add a geometry column to a dataset, check the projection of a dataset and reproject it, and get x/y values from your data in PostGIS.