Graph and Geospatial Analysis with PgRouting

How to configure and use a geospatial database with PgRouting

Lamberjack
13 min readMay 18

--

We can solve many problems regarding the graph analysis using SQL databases capabilities. In fact, there is a particular category of database designed to store and manage graph data: the graph databases. Those databases are designed to provide efficient and flexible access to graph data, and they typically include support for a query language that allows users to traverse the graph and perform complex queries and analysis.

Another interesting category of database is that of Geospatial database, which is specifically designed to store, manage, and analyze geospatial data. Geospatial databases are optimized for storing and querying data that represents objects defined in a geometric space, such as vector data and raster data.

In this article we are going to show how to use and configure locally a PostGIS database with PgRouting extension: two different configurations and procedures will be discussed:

· How to import and perform some basic operations on a graph data set.

· How to import a real portion of a real map and analyze the related geospatial data.

Another interesting category of database is that of Geospatial database, which is specifically designed to store, manage, and analyze geospatial data. Geospatial databases are optimized for storing and querying data that represents objects defined in a geometric space, such as vector data and raster data.

In this article we are going to show how to use and configure locally a PostGIS database with PgRouting extension: two different configurations and procedures will be discussed:

  • How to import and perform some basic operations on a graph data set.
  • How to import a real portion of a real map and analyze the related geospatial data.

What is PostGIS

As if that wasn’t enough, there is an extension of PostGIS: PgRouting. The PgRouting extension provides a set of functions and algorithms for performing network analysis tasks on geospatial data.

Here there are some tasks that can be performed using the functionalities provided by PgRouting:

  • Shortest path analysis: Finding the shortest path between two vertices on a graph. This is a common task in transportation planning, logistics, and navigation.
  • Vehicle routing: Finding the optimal route for a vehicle to visit a set of locations. This is a common task in delivery and transportation planning.
  • Network analysis: Analyzing the structure and properties of a graph, such as the degree distribution, betweenness centrality, and clustering coefficient.

Prerequisites

Before going deeply in technical details, you should have installed on your computer the following software:

  • QGIS Desktop (version 3.30) — Quantum GIS is a free and open-source geographic information system (GIS) software application. It is used for creating, editing, visualizing, analyzing, and publishing geospatial data and maps. It’s used for render and view your graph and your geospatial data. You can download for free QGIS: QGIS
  • Docker (version 20.10.21) — Docker is a popular containerization platform used to build, package, and deploy applications. It allows you to create and configure locally the database. Docker is an open-source software and you can install it freely: Docker

PgRouting with Docker

How can we set up and configure a local database quickly and in a standardized way?

For many readers the answer could be obvious (or maybe not): “work with Docker”. If you are familiar with some basic Docker concepts and notations, what we are going to explain shouldn’t be complicated to understand.

Docker is a popular containerization platform used to build, package, and deploy applications. It allows developers to create lightweight, portable containers that can run on any platform, regardless of the underlying hardware or software environment.

Some of the features of Docker which make it suitable for our purpose:

  • Building and testing applications in a consistent, reproducible environment.
  • Creating portable development environments that can be shared across teams.

Using Docker, you can have the same version and configuration of the PostGIS database that we are going to see. Furthermore, the local Docker environment could be easily adapted to other contexts such as a remote server or a cloud infrastructure.

You can find the source code that we are going to explain on GitHub.

Startup and Initialize the Database

To handle a multi-container application, Docker provides a very useful tool called docker-compose. It allows to startup an entire application environment (usually composed by more than one container):

docker-compose up -d

When you want to stop your application and remove all the related containers:

docker-compose down

Images and volumes of your environment aren’t removed by the previous command, so you have to delete them manually with the specific commands.

Using the docker-compose.yml you start up 2 containers:

  1. pgRouting database: a container running an instance of a pgRouting database, that is under the hood a PostgreSQL database. It create and configure the ‘gis’ database with some dummy graph data inside of it and with an geospatial data of a .osm map.
  2. pgAdmin client: a container running pgAdmin, a web client to interact easily with the pgRouting database.
version: '3.6'

services:

postgis:
build: .
image: pgrouting/osm2:1.0.0
container_name: postgis_local
volumes:
- pgis-data:/var/lib/postgresql/data
# copy the script to initialize the tables
- ./sql-scripts/init-db.sql:/docker-entrypoint-initdb.d/1-init-db.sql
# populate the table
- ./sql-scripts/insert-rows.sql:/docker-entrypoint-initdb.d/2-insert-rows.sql
# configure graph topology
- ./sql-scripts/create-topology.sql:/docker-entrypoint-initdb.d/3-create-topology.sql
# copy the osm data
- ./osm-data/map.osm:/usr/share/map.osm
# copy the osm style configuration
- ./osm-data/custom.style:/usr/share/osm2pgsql/custom.style

environment:
POSTGRES_PASSWORD: ${POSTGRES_PASSWORD}
POSTGRES_DB: ${POSTGRES_DB}
ports:
- '5433:5432'

pgadmin:
image: dpage/pgadmin4:6.21
container_name: pgadmin4_local
restart: always
depends_on:
- postgis
ports:
- "5051:80"
environment:
PGADMIN_DEFAULT_EMAIL: ${PGADMIN_EMAIL}
PGADMIN_DEFAULT_PASSWORD: ${PGADMIN_PASSWORD}
volumes:
- pgadmin-data:/var/lib/pgadmin

volumes:
pgis-data:
pgadmin-data:

The first container called ‘postgis_local’ starts the local database and its image is built within a Dockerfile.

from  pgrouting/pgrouting:13-3.1-3.1.3

RUN apt update \
&& apt install -y \
osm2pgsql \
nano

From the pgRouting official image, it installs the Open Street Map software needed to handle .osm files.

It creates a local database ‘gis’ that is accessible through the other container pgAdmin (or with any client for database management) with the following credentials:

User: postgres

Password: postgres

The second container ‘pgadmin4_local’ is created with the pgAdmin image and you can access to it directly on your browser:

http://localhost:5051/

With the following credentials:

User: user@email.com

Password: admin

Nevertheless, you are not bound to use the ‘pgadmin4_local’ container as a client, you can alternatively choose any database client you prefer.

Using ‘postgis_local’ container we can access to our database, now let’s go deeper into its configuration!

Graph Data Initialization

The initialization of the database, which is required for the representation of the graph data, is present within the volume configuration of docker-compose.yml file and it’s full automated with the container startup.

Inside the volume of the database container three SQL scripts are copied from the local /sql-script folder to the container filesystem. In particular, they are copied inside the folder used for the execution of the script during database initialization /docker-entrypoint-initdb.d.

Those 3 scripts are designed to write and configure the graph data directly inside the database during its initialization and they are executed exactly in this order:

1) init-db.sql:


CREATE EXTENSION POSTGIS;

CREATE EXTENSION PGROUTING;


DROP TABLE IF EXISTS ways;

-- create ways table used for storing graphs edges data
CREATE TABLE ways
(
gid bigserial NOT NULL,
class_id integer NOT NULL,
name text,
source bigint,
target bigint,
x1 double precision,
y1 double precision,
x2 double precision,
y2 double precision,
cost double precision,
priority double precision DEFAULT 1,
the_geom geometry(LineString,4326),
CONSTRAINT ways_pkey PRIMARY KEY (gid)
);

DROP TABLE IF EXISTS route_analysis;

-- create a route_analysis table, used for paths storing
CREATE TABLE route_analysis
(
id bigserial NOT NULL,
from_node integer NOT NULL,
to_node integer NOT NULL,
length double precision,
the_geom geometry(LineString,4326),
CONSTRAINT routing_pkey PRIMARY KEY (id)
);

Inside the gis database install the PostGIS and pgRouting extensions. Then it creates 2 tables:

  • ways: used to store information about the arcs of the graph and the related coordinates.
  • route_analysis: used to store path information, in particular the geometry data that allows a visual representation.

2) insert-rows.sql: contains a group of insertions that populate the ways table, every instruction inserts an arc of the graph. In this case, the arcs are links between Italian cities.

3) create-topology.sql:

SELECT  * INTO vertices from pgr_extractVertices('SELECT gid as id, the_geom as geom FROM ways');


WITH
out_going AS (
SELECT id AS vid, unnest(out_edges) AS eid
FROM vertices
)
UPDATE ways
SET source = vid
FROM out_going WHERE gid = eid;


WITH
in_coming AS (
SELECT id AS vid, unnest(in_edges) AS eid
FROM vertices
)
UPDATE ways
SET target = vid
FROM in_coming WHERE gid = eid;

This script creates the topology of the graph adding and updating some information, which will be essential when we will make the query on the graph data set.

  • It creates the vertices table, that contains all the vertices of the graph with the related information about coordinates, in edges and out edges. For this operation it uses a pgRouting function pgr_extractVertices, that extracts all the vertices related to the arcs of the graph from the ways table.
  • It updates the source and target columns of the ways table with the id of the vertices that are connected by each edge.

Graph Data Analysis

Once we have successfully initialized our database, we are ready to find out some detail of our graph data set. There are a lot more features that pgRouting offers for the graph analysis, here only a small subset of them is shown and is present inside ways-graph-queries.sql file.

  • Select the connected components of the graph including the related arcs:
select w.source, w.target, w.name, c.component FROM ways as w join 
pgr_connectedComponents(
'SELECT gid as id, source, target, cost FROM ways'
) as c on w.source = c.node
order by c.component;
  • Find the shortest path between two nodes using the Dijkstra algorithm:
SELECT * FROM pgr_dijkstra(
'SELECT gid as id, source, target, cost, cost as reverse_cost FROM ways',
<origin-node>, <destination-node>
);

Finally, we are going to use the route_analysis table to store information about every calculated route.

  • Calculate a path between two nodes and insert the origin/destination node and total cost of a path:
INSERT into route_analysis (from_node, to_node, length)
SELECT distinct <origin-node>, <destination-node>, agg_cost FROM pgr_dijkstra(
'SELECT gid as id, source, target, cost, cost as reverse_cost FROM ways',
<origin-node>, <destination-node>
) where edge = -1;
  • Updates the previous calculated path information with the geometry data of the route between two nodes:
UPDATE route_analysis r 
set (the_geom) = (SELECT st_makeline(pgr.the_geom)
FROM (SELECT w.the_geom FROM pgr_dijkstra(
'SELECT gid as id, source, target, cost, cost as reverse_cost FROM ways',
<origin-node>, <destination-node>
) as d left join ways as w on d.edge = w.gid) as pgr)
where r.from_node = <origin-node> and r.to_node = <destination-node>;

Now you can replace <origin-node> and <destination-node> with the id of the vertices table and do your own tests. Furthermore, if you want a representation that helps you to visualize what you have stored, QGIS is one of the best allies.

QGIS Representation of the Graph Tables

After you have downloaded and installed QGIS, you can use it to render your database data. Its configuration is quite intuitive:

1) Add OpenStreeMap layer to view the world geographic map.

2) Connect QGIS to your local PostgreSQL database. You can create a new connection and use the following credentials:

Host: localhost

Port: 5433

Database: gis

Then with double click on each table, its data will appear on the map.

Map Data Initialization

This second procedure shows how to store and analyze some geospatial data from a .osm file. Those file extensions are related to the OpenStreetMap file and represent a real portion of a map. You can have file containing nations, regions, or portions of them. It’s obvious that: larger is the portion of the map of the file, bigger is its dimension and more time and resources it will require for the processing.

In this example a file with a small portion of Italian highway is used.

Once we find out our map.osm file, it will be copied during the container initialization from the /osm-data/map.osm file to container path /usr/share/map.osm.

Initially an .osm file is complete of information and often many of them can be ignored and discarded; in this way we obtain two big advantages:

1) Focus attention on the data we are really interested in.

2) Improve the performance of loading and analysis.

In fact, thanks to osm2pgql software, that have been installed in our container (see Dockerfile), we are able to import .osm file and specify which information of it we want store inside database.

The configuration file /osm-data/custom.style will be copied inside the postgis container /usr/share/osm2pgsql/custom.style. In this example the content of this file is suitable to highway data, but you can configure to import everything you want. Installing osm2pgsql on postgis it provides a default file of configuration /usr/share/osm2pgsql/default.style; it includes all the information to be imported from an .osm file.

Once you have chosen your .osm file and defined your own configuration, you are ready to start the import process! As I said before, this step can last long time and overwhelm the resources of your pc: it is related to your hardware performance and your file dimensions.

Launch the following command:

 docker exec -it postgis_local  osm2pgsql -c -d gis -U postgres --password -H 127.0.0.1 --style /usr/share/osm2pgsql/custom.style /usr/share/map.osm

It executes inside postgis_local container osm2pgsql the import command, that takes both the .osm file and the configuration file and creates the related tables:

1) planet_osm_roads: Contains the information regarding roads.

2) planet_osm_polygon: All the buildings and structures of the map are stored inside it.

3) planet_osm_point: Stores the points of the map; a point can be everything from a ‘STOP’ signal to a streetlamp.

4) planet_osm_lines: Every link of the map, that is not a road, can be a line (footpath, cicleway, etc.)

Due to our configuration file all the data of this tables is related to the highway context.

Map Data Analysis

Now we have geospatial data inside the database, and we can perform complex analysis and studies on them. Let’s take a look at a simple graph analysis of the roads table.

We focus only to the planet_osm_roads table because its structure is easily referable to a graph.

First, we have to connect the arcs of the roads table:

ALTER TABLE planet_osm_roads ADD COLUMN "source" integer;
ALTER TABLE planet_osm_roads ADD COLUMN "target" integer;

select pgr_createTopology('planet_osm_roads', 0.0001, 'way', 'osm_id');

The source and target columns are added to the planet_osm_roads, and then we use pgr_createTopology function to connect every arc of the graph.

This pgRouting function uses the geometry data of the column way to connect the start and the end of every road. In fact, it creates the related table planet_osm_roads_vertices_pgr with the vertices information and updates the source and target columns of the planet_osm_roads table.

Working on real data means that we can obtain real distances between the points:

ALTER TABLE planet_osm_roads ADD COLUMN "length" double precision;

UPDATE planet_osm_roads SET length = st_length(way);

With the st_length function we add the length (in meters) of each road, relying to the geospatial data of the way column. Thus, every arc of the graph will have its own cost in the length column.

In the previous graph data analysis, we have stored the path information in the route_analysis table, although we could modify that table to store the new path data, the creation of a brand-new table it’s cleaner and clearer.

CREATE TABLE road_analysis
(
id bigserial NOT NULL,
from_node integer NOT NULL,
to_node integer NOT NULL,
length double precision,
the_geom geometry(LineString,3857),
CONSTRAINT road_routing_pkey PRIMARY KEY (id)
);

The difference between the old route_analysis and the road_analysis is subtle but essential: the column the_geom stores a different kind of geometry. In fact, every geometry data must be initialized with its own SRID (Spatial Reference ID). The SRID of a geometry is a numerical code that identifies the spatial reference system (SRS) being used to represent the spatial data.

The column the_geom will store path data from the way column of planet_osm_roads table, thus their types must be the same:

SELECT ST_SRID(way) FROM planet_osm_roads LIMIT 1;

Now we are ready to perform path calculations on the planet_osm_roads table and the queries will be almost the same of the graph path analysis.

  • Calculate a path between two nodes and insert the origin/destination node and total cost of a path (the nodes are the vertices of planet_osm_roads_vertices_table):
INSERT into road_analysis (from_node, to_node, length)
SELECT distinct <origin-node>, <destination-node>, agg_cost FROM pgr_dijkstra(
'SELECT osm_id as id, source, target, length as cost FROM planet_osm_roads',
<origin-node>, <destination-node>
) where edge = -1;
  • Updates the previous calculated path information with the geometry data of the route between two nodes:
UPDATE road_analysis r 
set (the_geom) = (SELECT st_makeline(pgr.way)
from (SELECT w.way FROM pgr_dijkstra(
'SELECT osm_id as id, source, target, length as cost FROM planet_osm_roads',
<origin-node>, <destination-node>
) as d left join planet_osm_roads as w on d.edge = w.osm_id) as pgr)
where r.from_node = <origin-node> and r.to_node = <destination-node>;

Now let’s see how those data appear on QGIS.

QGIS Representation of the OSM Map Tables

We can use QGIS in the same way as before to render the OSM Map data and the related calculated path.

Once QGIS has been connected to the local database, we can see the information relating to the tables in which we are interested:

Conclusions

We aimed at showing the potentialities and the main capabilities of a technology which could be very useful in a GIS application. In fact, inside of the GIS ecosystem a geospatioal database with pgRouting extension can carry on many benefits, especially thanks to the standardization of its functions. Furthermore, being a Postgres extension, it can be easily integrated with an existing database.

In this article we tried to standardize some initial configurations and settings, with the purpose to provide a concrete starting point for a data model of real application. Making a simple path analysis on a graph data and on a .osm file, we have just started to scratch the surface of the hundreds of functionalities provided by pgRouting extension.

--

--

Lamberjack

Software Developer, Technology passionate, Blockchain enthusiast