Why should you care about PostGIS? — A gentle introduction to spatial databases

A week of flights in Brazil. Original file was just a csv with origin and destination coordinates. I loaded the data to PostGIS, created point geometries from the coordinates, then created lines between the points and eventually visualized the data with QGIS Time Manager.

Post… what?

That’s a database. In IT architecture a database is represented as a cylinder. It’s a place where you can store your data.
  1. Spatial data types such as point, line, and polygon. Familiar to most working with spatial data;
  2. Multi-dimensional spatial indexing is used for efficient processing of spatial operations;
  3. Spatial functions, posed in SQL, are for querying of spatial properties and relationships.
SELECT some_columns FROM some_data_source WHERE some_condition;

Part 1: A new way of working

22 million points of ship GPS locations rendered from PostGIS with QGIS. Can you see where the ships are moving on rivers and where they are at open sea?

Part 2: The magical world of spatial SQL

A database related blog post always has to have a barchart comparing processing times. PostGIS = very fast. Barcharts don’t lie.
SELECT ST_GeneratePoints(geom, he_vakiy) from paavo.paavo
  • Routing. With pgRouting and road data you can find optimal routes and do different network analytics;
  • Polygon skeletonization. This function enables you to build the medial axis of a polygon on the fly;
  • Geometry subdivision. Dividing your geometries for further processing can significantly speed up your processes;
  • Clustering. Find clusters and patterns from your data. With the AI hype at peak, the k-means might be even more interesting for some than before…

PostGIS for dataviz

Train voronoi lines. Oddly satisfying.
INSERT INTO trains.voronoipoints 
SELECT '2018–01–15 09:00:00' AS t,
geom
FROM (SELECT St_centroid(St_collect(geom)) AS geom,
trainno
FROM (SELECT geom,
trainno
FROM trains.week
WHERE time > '2018–01–15 09:00:00'
AND time < '2018–01–15 09:01:00') AS a
GROUP BY trainno) AS b
  • You can see some of the normal elements of a SQL query (INSERT INTO, SELECT, AS, FROM, WHERE, AND, GROUP BY)
  • geom, trainno and time are column names in my week table in the schema called trains
  • The subquery a returns all GPS points which have been tracked within the requested timeframe.
  • Because I select all GPS points tracked inside one minute, I might get several points for each train. I only wanted one, so that the voronoi lines would look more sensible. That’s why I use ST_Collect to group the points together and to create a multipoint geometry from them. ST_Centroid replaces the multipoint geometry with a single point located at the centroid (subquery b) and the data is grouped by train numbers.
SELECT t, ST_VoronoiLines(geom) from trains.voronoipoints
Hey look! I made a SQL meme!

How to get started?

A screenshot from QGIS DB Manager.
  1. Just download the installers and install PostGIS on your local machine. Follow the instructions in the tutorials;
  2. Load some data in there. Start with a single Shapefile using QGIS DB Manager or chech for example this tutorial on how to get Natural Earth data to PostGIS;
  3. Start playing around with SQL. Start with the basics (selecting, filtering and modifying the data) and slowly you will see what kind of benefits it could bring in to your workflow.

Conclusions

I had New York bikeshare data with start and end points. With GraphHopper I calculated the optimal routes between the origin and the destination, I loaded thousands of resulting gpx files to PostGIS with ogr2ogr. In PostGIS I created lines from the points and visualized the data with QGIS.

Want to learn more? Sources for this blog post and further PostGIS reading

--

--

--

I mostly put lines, points and polygons on maps 🌍

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Traffic & Covid

Essential Skills for every Data Specialist

Machine Learning

Quickstart for creating simple Interactive Maps in Jupyter Notebooks using ipyleaflet

Working with Missing Data in Machine Learning

What you should know before you hire a data scientist

Survey of learning from only a few examples with computer vision applications

Essential applications and advantages of data science in online store

Video Games Sales Analysis

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Topi Tjukanov

Topi Tjukanov

I mostly put lines, points and polygons on maps 🌍

More from Medium

Geonames dataset: What is it?

Geonames Alternate Names

2022 British Grand-Prix Data Visualizations

Web Scraping with Python in Indonesian E-Commerce (Tokopedia) Part 2: Present the Data as a CSV…

Launch a Data Science Environment on Oracle Cloud Infrastructure