PostGIS tips: Working with geometry columns and projection systems

This is Part 2 in a three-part series on using PostGIS at the City of Boston.

I put these docs together in an effort to help my colleagues at the City of Boston get acquainted with PostGIS. You can find Part 1 here.

This section walks through adding a geometry column to a dataset, dealing with projection systems, and getting x/y values from a geometry column.

Adding and Populating a Geometry Column

PostGIS stores a table’s spatial information in geometry columns. Usually these columns have the word “geom” floating around in their name. If you select * on a table with a geometry column, the geom column will look like a long series of nonsense.

Frequently, long/lat values get stored in columns in a database or spreadsheet. To make these into a geometry column that PostGIS understands, we:

  1. Add a geometry column
  2. Update that column with geometry values using our lat/long fields

Adding a geometry column

The AddGeometryColumn function adds an empty geometry column to your table. It needs:

  • the schema your table is in
  • the table name
  • the name of your geom column
  • the EPSG code of the projection system your data is in
  • the type of spatial data (usually POINT)
  • and the dimensions of your data (usually 2 unless you’re working with 3D)

Quick Tip: Figuring out what projection system your coordinates are in

Usually, you’ll get coordinates in WGS84, or your normal lat/long values. Sometimes, you may get values from a different coordinate system and the person who gave you them may not know what projection system they are in. When this happens, it can be helpful to know what the coordinates for that system might look like so you have a place to start guessing.

After guessing the coordinate system, its best to map your data and a run a query that tests its spatial relationship to another layer. If you get the expected results from that query, you’ve gotten your data in the correct projection system. 👍

Updating a geometry column with existing lat/longs

We use two functions to make points out of our latitude/longitude numbers: ST_MakePoint and ST_SetSRID.

ST_MakePoint takes the x and y values you’re using to make the point. For us, longitude is our x, and latitude is our y. The function requires these numbers to be in the type double precision. If yours aren’t already you can cast them in the statement as we do below.

St_SetSRID sets the spatial reference system for the point we make. It takes the point and the EPSG code your setting it to. This EPSG code needs to be the same as what you set your geom column to when you added it.

update sandbox.public_bathrooms
set geom = ST_SetSRID(
"Longitude"::double precision,
"Latitude"::double precision
), 4326)

Good Reads

Checking what projection your data is in

Find_SRID will return the EPSG code of the projection system the geom column you’ve passed to it. (SRID stands for Spatial Reference IDentifier). It takes the schema your table is in, the name of the table, and the name of the geometry column in that table.

select Find_SRID('sandbox', 'public_bathrooms', 'geom')

Quick tip: The geometry_columns table

PostGIS stores information about all the geometry columns in the geometry_columns table. You can find information about the type of geometry stored in any database table and the EPSG code for the spatial reference system it is in (SRID).

Output from geometry_columns table.

A thing to note

You’ll see “Multipolygon”, “Multiline”, or “Multipoint” as the geometry type in some cases. These types are used when a row in your dataset may actually be referring to multiple separate polygons/lines/points.

A good example of a multipolgyon dataset is our City Council districts layer. District 3 (below) is made of Dorchester and the Harbor Islands. This district is one row in our table, but made up of four distinct polygons when we map it, making it a “multipolygon”.

Example of a dataset with a geometry type of “multipolygon”.

Transforming to a different projection

In most GIS systems, reprojecting a dataset means creating a new copy of the dataset in a different projection system. This leads to lots of slightly different copies of the same thing which is confusing. PostGIS is hugely helpful here because it allows us to:

  1. Reproject data inline in a SQL query
  2. Have two geometry columns associated with one dataset that are in different projections

Reprojecting inline

ST_Transform is a simple and extremely powerful function. It takes only the name of a geometry column and the EPSG code of the new projection system. It handles all the math needed to successfully change the projection of your data.

You can use ST_Transform directly inside a SQL query you are running making switching between projection systems and comparing layers in different projection systems easier.

ST_transform(geom, 2249) as geom_2249
from sandbox.public_bathrooms

Two geom columns on one dataset

Transforming inline can slow down queries that are doing lots of spatial calculations. In those scenarios, it might be best to add another geometry field to your data that is in a different projection system. You can then use the second geometry column for any operation you need, just as you would the first.

select AddGeometryColumn('sandbox', 'public_bathrooms', 'geom2249', 2249, 'POINT', 2)
update sandbox.public_bathrooms 
set geom2249 = ST_transform(geom, 2249)

After adding the new geom column, you’ll see two entries for the table in QGIS so you can choose to map the data in either projection.

QGIS displays all the geom columns in a dataset.

Good Reads

Getting x/y values from geometry column

Often, it’s helpful to get x and y values from a geom column as they are more readable by other humans and systems. For example, when we post the Vision Zero crash records on Analyze Boston, we want the latitude and longitude of the crash after it has been snapped to the street centerline.

ST_x and St_y get those values for you from a PostGIS geometry.

-- Add new columns to dataset
alter table sandbox.public_bathrooms add column x_coord_2249 numeric
alter table sandbox.public_bathrooms add column y_coord_2249 numeric

-- populate with x and y values
update sandbox.public_bathrooms
set x_coord_2249 = ST_x(ST_transform(geom, 2249))

update sandbox.public_bathrooms
set y_coord_2249 = ST_y(ST_transform(geom, 2249))

Thanks for reading! Part 3 has an overview of functionality like spatial indexing, spatial joins, finding the nearest, and clustering in PostGIS.