Hands-on PostGIS, exploring the Geospatial capabilities
I recently worked on a problem, the solution of which required geospatial querying and transformations. I initially thought of doing everything in-memory using a JavaScript library called turf. This was going fine until I used the actual dataset (I had around 24,000 Polygons having vertices ranging from 50 to 5,000), this is where I guess both the time and space complexities of the operations I was doing via turf became too large, and it just gave up.
This is the point where I had to move to PostGIS in order to do those operations.
Someone might argue that out there is so much dedicated GIS software like QGIS, ArcGIS, why not use them. The answer is simple, it’s not only the GIS work that I am doing on my database, but there are also lots of regular CRUD operations as well, and PostGIS is just an extension to Postgres, which makes it perfect for this kind of use cases. And if you are thinking of suggesting me Mongo, I am gonna stop you right there, Mongo supports few geospatial queries but has no geo-transformation capabilities at all.
In case you are worried to set up Postgres in replication mode and all, you can always use an RDS instance. Recently I have been hearing a lot of good things about Amazon Aurora as well.
But let’s define the problem statement first
Problem Statement
Let’s say that you want to set up a fast-food chain in New York. But you want to strategically place your restaurants to get the max customers, hence you decided these criteria
- No restaurant should be near 205 meters of a KFC
- All restaurants should be in 300 meters of KFC
- All restaurants should be inside New York
You have to also generate the result in a GeoJSON file so that, that can be drawn onto a map for better visualization.
We are going to use GeoJSON to represent geometries throughout
Simple right? 😏 Let’s dive into code.
Environment setup
- I am going to do this in node.js, hence download and install that.
- I am not going to install
postgres
and use adocker image
instead, so download and install docker. - Running these two commands in the image below consecutively will pull the docker image for PostGIS and start a container.
I am going to use
node.js
for this, don’t worry if you don’t understandJavaScript/ TypeScript
, I am not going to use any language-specificObject Relational Mapper
, I will execute raw PostGIS queries from code, which will make it language agnostic.
Available Data
Let’s assume that we already have the boundary information of New York and all the KFCs there. For the sake of simplicity, I will not use the actual data but use a Polygon to represent NY and multiple Polygons to represent KFCs.
This is our imaginary NY (in gray), and KFCs (in red)
These Polygon boundaries are available in .json
files, which you can find here.
Project Setup
I am going to set up a very simple node.js
+ TypeScript
console application. If you are following along, you don’t have to again do that, you can download the template project from here. This already has all the bootstrap
and boilerplate
code -> Branch_link
I am going to divide the entire problem into multiple small tasks and solve it step by step.
I will share certain branches of my repo after each task, which contains the solution to that particular task. So don’t get baffled by a code screenshot. This is done so that you try to write the code yourself, which adds an active learning kind of a flavor. Also, the complete solution would also be provided at the end of the article. Try to follow as much as you can, in case you fail, you can always check out the branch for that task.
If you didn’t understand this, then you probably need to learn Git first, which I will not explain, but you can learn it from here.
Task 1: Insert the NY Polygon and KFC Polygons into DB
Ok, so we need to first insert all the related data into the DB so that we can query/ operate on it.
To do so, I ended up writing these two files.
- The first is the
postgresPool.ts
file
Which basically instantiates the postgres connection pool
, which you can use to query the DB.
2. And the second is the index.ts
file
I know, I know it’s long, but let me explain, it’s pretty simple actually. The flow is like this
- Create a table with the name
ny_boundary
, having 2 columns, id, and geom. - Read the geojson file
ny.json
and insert that into this table. - Create a table with the name
kfc_boundaries
, having 2 columns, id, and geom. - Read the geojson file
kfc.json
and insert that into this table. - buildTableCreationQuery and buildInsertionQuery are basically 2 helper methods that generate the query for you given the data.
This is how the create table
query would look if you run it as a SQL command
And the insert
SQL query
We are using the ST_GeomFromGeoJSON function, cause the data we have as input is in geojson format.
And that concludes the task1, woot woot
🕺
And in case you could not follow along, here’s the -> branch_link that I promised. The boundary geojson files are at /src/input
Task 2: Expand KFC boundaries by 205 meters and merge them if overlapping
Now this task has 2 sub-tasks.
- Expand the KFC boundaries by 205 meters, this will give us the area where we should not put our restaurants.
- We need to merge the expanded boundaries if there’s an overlap between any of them. Cause merged polygons look much better than overlapping polygons when rendered on a map. There are other benefits as well as it reduces the data size, which can matter when dealing with huge data.
Ok, the code to achieve
Again, let me explain. I am doing two things.
- Creating a
level1_boundaries
table. - Expanding all
kfc_boundaries
by 205 meters, merging them, and then inserting them into thelevel1_boundaries
table.
I know the query part for the second operation might look a bit complex, a lot of things are going on there. So I will break down the parts of the query and try to explain what’s going on.
This is the query that we basically ran.
select st_buffer(geom::geography, 205 )::geometry from kfc_boundaries
st_buffer -> This is the function which does the expansion operation, it takes two params, one is the geometry/ geography object, and the other is the radius for expansion. Which are geom and 205
for our case.
geom::geography -> This is typecasting operation. We are taking the value of the geom column from kfc_boundaries and typecasting it to a geography
object. We need to this so so that the function considers the geom
value as a EPSG:4326 geometry and hence it will consider the 205 as meters. If you don’t do this typecasting, the st_buffer function will consider the value 205 meters to be 205 degrees.
::geometry
We are again converting the result of the buffer operation to a geometry
object, as st_union
can only operate on geometry
objects.
st_union(array(...))
st_union -> This function merges the geometries returned from the st_buffer function, if there are overlapping polygons it merges them into a single polygon, if the polygons are disjoint it creates a multipolygon out of them.
array -> As the kfc_boundaries table has multiple rows, the select st_buffer ...
query will return an array, so to specify that it’s an array input, we are using this.
insert into level1_boundaries (geom) ...
This basically inserts the result of the st_union operation into the level1_boundaries
table.
Putting it together, this is how it looks now
Blue polygons -> KFC boundaries expanded by 205 meters
Red Polygons -> KFC boundaries
That’s the completion of task 2, and here’s the -> branch_link
Task 3: Repeat step 2 but for a distance of 300 meters
Here’s the code
Nothing new, and here’s everything rendered on a map.
Light green Polygons -> KFC boundaries expanded by 300 meters
Here’s the -> branch_link for this task.
Task 4: Subtract level 1 boundaries from level 2 boundaries to find out the green zone
Now we have 2 MultiPolygons
- Boundary + 205 meters -> level 1
- Boundary + 300 meters -> level 2
We need to do
level2 - level1
To find out the green zone, where we can set up our restaurants.
Here’s the code
The flow is like this
- Create a table named
boundary_difference
- Find the level1 boundary (the table has 1 row, hence for simplicity, I am using the first row only)
- Find the level2 boundary (the table has 1 row, hence for simplicity, I am using the first row only)
- Find the difference between these two objects using the
st_difference
function. It takes two geometries, finds out the difference between them, and gives back the difference as a geometry object. - Store the difference in
boundary_difference
table.
And here’s the outcome drawn on a map
Here’s the -> branch_link for this task.
Task 5: Find the intersection of NY boundary and green zones
You can see that some parts green zone which we just found out is going outside the NY boundary, and we do not want to set up our restaurants outside NY. Hence what we have to do now is to find out the intersection of the NY boundary and the green zone.
Here’s the code
Flow
- Get the NY boundary and Difference calculated before
- Find out the intersection between them using
st_intersection
, which has a similar signature asst_difference
, takes two geometries and returns the intersecting area as a geometry object. - Convert the result into
geojson
usingst_asgeojson
- Print the result in the console
And here’s the final picture, rendered on a map.
You can set-up you dream restaurant chain now 😆
And that my friend, concludes this tutorial on using some of the basic functionalities of PostGIS to some pretty cool geospatial tasks.
You can download the complete solution from here.