Seamless coastal boundaries with PostGIS

The OpenStreetMap (OSM) project provides our favourite base data for a wide range of applications. Most data displayed on top of OSM does not align well with existing geometries. Often this data should not be included in the OSM source itself but geometries should be aligned to coast lines and national boundaries to form seamless presentation layer.

This article outlines a process to create seamless coastal geometries with PostGIS. I will be using a subset of admin geometries from the 2011 Portuguese census.

Small area census geometries in Portugal.

Portuguese census geometries intersect the coastline throughout. We want to exclude from the census polygons the intersection with water and in turn union the clipped land surface. All geometries are geographic in the EPSG:4326 system. I run the process on a laptop with PostgreSQL 9.5 + PostGIS 2.1. Coastline datasets are provided as split water and land polygons. These datasets are downloaded as shp files and loaded into PostGIS with shp2pgsql. I use QGIS 2.18 for the map display together with the QGIS changeDataSource plugin since table schema are modified frequently.

Census geometries over split OSM water and land polygons.
All geometry fields need to be spatially indexed.
CREATE INDEX sidx_osm_land_geom ON osm_land USING GIST (geom);

I ST_Dump() single geometries into new tables for processing. I use a distance of 0.0005 degree from the coastline to identify polygons which need to be processed. Single geometries are required as input for the PostGIS split function. gid and glx are unique identifiers.

SELECT
osm_water.gid,
(ST_Dump(osm_water.geom)).geom :: GEOMETRY(polygon, 4326)
INTO water_single
FROM osm_water, pt
WHERE ST_DWithin(osm_water.geom, pt.geom, 0.0005)
GROUP BY osm_water.gid, osm_water.geom;

CREATE INDEX sidx_water_single_geom
ON water_single USING GIST (geom);


SELECT
pt.glx,
(ST_Dump(pt.geom)).geom :: GEOMETRY(polygon, 4326)
INTO pt_single
FROM pt, water_single
WHERE ST_DWithin(pt.geom, water_single.geom, 0.0005)
GROUP BY pt.glx, pt.geom;

CREATE INDEX sidx_pt_single_geom ON pt_single USING GIST (geom);


SELECT
osm_land.gid,
(ST_Dump(osm_land.geom)).geom :: GEOMETRY(polygon, 4326)
INTO land_single
FROM osm_land, pt_single
WHERE ST_Intersects(osm_land.geom, pt_single.geom)
GROUP BY osm_land.gid, osm_land.geom;

CREATE INDEX sidx_land_single_geom ON land_single USING GIST (geom);
water_single (blue), pt_single (pink), land_single (green)

We turn the water polygons into a linestring dataset (water_line). These lines are used as blades to ST_Split() the admin polygons (pt_single). The snapping is processed in a PL/pgSQL function. There may be a more elegant solution which uses PostgreSQL recursive queries. I put this question to the GIS StackExchange community.

SELECT
(ST_Dump(ST_Boundary(geom)).geom :: GEOMETRY(linestring, 4326)
INTO water_line
FROM water_single;

CREATE INDEX sidx_water_line_geom ON water_line USING GIST (geom);

CREATE TABLE pt_split AS SELECT geom FROM pt_single;

CREATE INDEX sidx_pt_split_geom ON pt_split USING GIST (geom);

CREATE TEMP TABLE cursor_tmp (glx varchar, geom GEOMETRY(POLYGON, 4326));

DO $$
DECLARE
_geom GEOMETRY(LINESTRING, 4326);
_curs CURSOR FOR SELECT geom FROM water_line;

BEGIN
OPEN _curs;
LOOP
FETCH
_curs INTO _geom;
EXIT WHEN NOT
FOUND;

TRUNCATE TABLE cursor_tmp;

INSERT INTO cursor_tmp
SELECT
glx,
(ST_Dump(
ST_CollectionExtract(
ST_MakeValid(ST_Split(geom, _geom))
, 3))).geom :: GEOMETRY(POLYGON, 4326)
FROM pt_split
WHERE ST_Intersects(_geom, geom);

DELETE FROM pt_split WHERE st_intersects(_geom, geom);

INSERT INTO pt_split SELECT
* FROM cursor_tmp;

END LOOP;
CLOSE _curs;
END $$
;
pt_split (pink), water_line (blue)

We create point geometries on the surface of the pt_split layer and use the location of the points within the water_single areas to identify which clips should be removed from the admin boundaries.

ALTER TABLE pt_split
ADD COLUMN geomcntr GEOMETRY(point, 4326);

UPDATE pt_split SET geomcntr = ST_PointOnSurface(geom);

CREATE INDEX sidx_pt_split_geomcntr
ON pt_split USING GIST (geomcntr);

ALTER TABLE pt_split ADD COLUMN water BOOLEAN DEFAULT FALSE;

UPDATE pt_split SET water = TRUE
FROM
water_single
WHERE ST_DWithin(water_single.geom, pt_split.geomcntr, 0);
pt_split (pink) filtered by “water” IS FALSE

Using PL/pgSQL we split land_single with pt_line as linestring table which is created by dumping the boundaries from pt_single. The output of the function is land_split.

SELECT
(ST_Dump(
ST_Boundary(geom)
)).geom :: GEOMETRY(linestring, 4326)
INTO pt_line
FROM pt_single;

CREATE INDEX sidx_pt_line_geom
ON pt_line USING GIST (geom);

CREATE TABLE land_split AS SELECT geom FROM land_single;

CREATE INDEX sidx_land_split_geom
ON land_split USING GIST (geom);

DROP TABLE cursor_tmp;
CREATE TEMP TABLE cursor_tmp (geom GEOMETRY(POLYGON, 4326));

DO $$
DECLARE
_geom GEOMETRY(LINESTRING, 4326);
_curs CURSOR FOR SELECT geom FROM pt_line;

BEGIN
OPEN _curs;
LOOP
FETCH
_curs INTO _geom;
EXIT WHEN NOT
FOUND;

TRUNCATE TABLE cursor_tmp;

INSERT INTO cursor_tmp
SELECT (ST_Dump(
ST_CollectionExtract(
ST_MakeValid(ST_Split(geom, _geom))
, 3))).geom :: GEOMETRY(POLYGON, 4326)
FROM land_split
WHERE ST_Intersects(_geom, geom);

DELETE FROM land_split WHERE st_intersects(_geom, geom);

INSERT INTO land_split SELECT
* FROM cursor_tmp;

END LOOP;
CLOSE _curs;
END $$
;

We create point geometries on the surface of the land_split layer and use the location of the points within the pt admin areas to identify which clips should added to the admin boundaries.

ALTER TABLE land_split
ADD COLUMN geomcntr GEOMETRY(point, 4326);

UPDATE land_split SET geomcntr = ST_PointOnSurface(geom);

CREATE INDEX sidx_land_split_geomcntr ON land_split USING GIST (geomcntr);

ALTER TABLE land_split
ADD COLUMN admin BOOLEAN DEFAULT FALSE;

UPDATE land_split SET admin = TRUE
FROM
pt WHERE ST_DWithin(pt.geom, land_split.geomcntr, 0);
UPDATE land_split SET admin = TRUE
FROM
pt WHERE ST_Contains(land_split.geom, pt.geom);
land_split (green) filtered by “admin” IS FALSE

Overlapping land_split polygons need to be joined. These occur where clips are created where land_single polygons overlap. This is the case in the split OSM land polygon dataset.

ALTER TABLE land_split ADD COLUMN overlap BOOLEAN DEFAULT FALSE;

UPDATE land_split
SET overlap = TRUE
FROM
(SELECT *
FROM land_split
WHERE land_split.admin IS FALSE) same
WHERE land_split.admin IS FALSE
AND
ST_Overlaps(land_split.geom, same.geom);

SELECT
land_split.admin,
(ST_Dump(
ST_CollectionExtract(
ST_MakeValid(
ST_Union(
land_split.geom,
same.geom))
, 3))).geom :: GEOMETRY(POLYGON, 4326)
INTO land_split_overlap
FROM land_split, (SELECT *
FROM land_split
WHERE land_split.admin IS FALSE) same
WHERE land_split.admin IS FALSE
AND
ST_Overlaps(land_split.geom, same.geom);

CREATE INDEX sidx_land_split_overlap_geom
ON land_split_overlap USING GIST (geom);

DELETE FROM land_split where overlap IS TRUE;
land_split_overlap (yellow)

We identify how many filtered pt_split geometries are intersected by each filtered land_split geometry.

ALTER TABLE land_split ADD COLUMN touch INTEGER DEFAULT 0;

UPDATE land_split
SET touch = Q.touch
FROM
(SELECT
land_split.geom,
count(land_split.geom) touch
FROM land_split, pt_split
WHERE
ST_Intersects(
land_split.geom,
pt_split.geom
)
AND land_split.admin IS FALSE
AND
pt_split.water IS FALSE
GROUP BY
land_split.geom
) Q
WHERE land_split.geom = Q.geom;


SELECT
Q.glx,
ST_CollectionExtract(
ST_MakeValid(
ST_Union(
geom
)
)
, 3) geom
INTO pt_union
FROM (
SELECT
pt_split.glx,
ST_CollectionExtract(
ST_MakeValid(
ST_Union(
pt_split.geom,
land_split.geom
)
)
, 3) geom
FROM pt_split, land_split
WHERE
ST_Intersects(
pt_split.geom,
land_split.geom
)
AND pt_split.water IS FALSE
AND
land_split.touch = 1
) Q
GROUP BY glx;

CREATE INDEX sidx_pt_union_geom ON pt_union USING GIST (geom);

The land_split_pts layer is created with ST_DumpPoints() from the land_split layer. We identify the number of intersections from the filtered pt_split layer.

ALTER TABLE land_split
ADD COLUMN clip_id SERIAL NOT NULL PRIMARY KEY;

SELECT
clip_id
,
(ST_DumpPoints(geom)).geom
INTO land_split_pts
FROM land_split where touch > 1;

CREATE INDEX sidx_land_split_pts_geom
ON land_split_pts USING GIST (geom);

ALTER TABLE land_split_pts
ADD COLUMN qid SERIAL NOT NULL PRIMARY KEY;

ALTER TABLE land_split_pts ADD COLUMN touch INTEGER DEFAULT 0;

UPDATE land_split_pts
SET touch = Q.touch
FROM (
SELECT
land_split_pts.qid,
count(land_split_pts.geom) touch
FROM land_split_pts, pt_split
WHERE
ST_DWithin(
land_split_pts.geom,
pt_split.geom, 0.000001
)
AND pt_split.water IS FALSE
GROUP BY
land_split_pts.qid
) Q
WHERE land_split_pts.qid = Q.qid;
land_split_pts with no interction (blue) and more than one intersection (orange).

By using a function to break a linestring into segments by Paul Ramsey we are able dump the line segments which make up the filtered land_split polygons into the land_line_coast layer. We add the ST_Centroid() of the line segments as a geometry field to the same table and check whether the centroids are within 0.000001 degree of water_single geometries.

WITH segments AS (
SELECT
clip_id
,
ST_MakeLine(lag((pt).geom, 1, NULL)
OVER (
PARTITION BY clip_id
ORDER BY clip_id
, (pt).path), (pt).geom) geom
FROM (SELECT
clip_id
,
ST_DumpPoints(geom) AS pt
FROM land_split
WHERE admin IS FALSE AND touch > 1) AS dumps
)
SELECT clip_id, FALSE touch, geom
INTO land_line_coast
FROM segments
WHERE geom IS NOT NULL;

CREATE INDEX sidx_land_line_coast_geom
ON land_line_coast USING GIST (geom);

ALTER TABLE land_line_coast
ADD COLUMN geomcntr GEOMETRY(point, 4326);

UPDATE land_line_coast SET geomcntr = ST_Centroid(geom);

CREATE INDEX sidx_land_line_coast_geomcntr
ON land_line_coast USING GIST (geomcntr);

UPDATE land_line_coast
SET touch = TRUE
FROM
water_single
WHERE ST_Dwithin(land_line_coast.geomcntr, water_single.geom, 0.000001);
land_line_coast (blue) filtered by “touch” IS TRUE

We use ST_ClosestPoint() to find the closest point on land_line_coast (grouped by their respective clip_id) and create the layer land_line_coast_pts.

SELECT
land_split_pts.qid,
land_split_pts.clip_id,
ST_ClosestPoint(land_line_coast.geom, land_split_pts.geom) geom
INTO land_line_coast_pts
FROM
(SELECT
clip_id
, ST_Union(geom) geom
FROM land_line_coast
WHERE touch IS TRUE
GROUP BY clip_id
) land_line_coast
, land_split_pts
WHERE land_split_pts.touch > 1
AND land_line_coast.clip_id = land_split_pts.clip_id;

CREATE INDEX sidx_land_line_coast_pts_geom
ON land_line_coast_pts USING GIST (geom);

INSERT INTO land_line_coast_pts
SELECT qid, clip_id, geom
FROM land_split_pts
WHERE touch > 1;
CREATE INDEX sidx_land_split_pts_geom
ON land_split_pts USING GIST (geom);
land_line_coast_pts (orange)

We connect the land_split_pts points to create the land_split_blade layer. A function is used to extend the blades by 0.0001 degrees into each direction.

SELECT
qid
,
clip_id,
ST_MakeLine(geom) geom
INTO land_split_blade
FROM land_line_coast_pts
GROUP BY qid, clip_id;

CREATE INDEX sidx_land_split_blade_geom
ON land_split_blade USING GIST (geom);

UPDATE land_split_blade
SET geom = ST_MakeLine(
ST_TRANSLATE(Q.a, sin(Q.az1) * Q.len, cos(Q.az1) * Q.len),
ST_TRANSLATE(Q.b, sin(Q.az2) * Q.len, cos(Q.az2) * Q.len)
)
FROM (
SELECT
qid
,
a,
b,
ST_Azimuth(a, b) AS az1,
ST_Azimuth(b, a) AS az2,
ST_Distance(a, b) + 0.0001 AS len
FROM (
SELECT
qid
,
ST_StartPoint(geom) AS a,
ST_EndPoint(geom) AS b
FROM land_split_blade
) sub
) Q
WHERE land_split_blade.qid = Q.qid;
land_split_blade

We copy geometries from land_split into land_split_ and use a split function to cut the polygons along the blade linestrings in land_split_blade.

CREATE TABLE land_split_
AS SELECT
geom,
admin,
touch,
clip_id
FROM
land_split;

CREATE INDEX sidx_land_split__geom
ON land_split_ USING GIST (geom);

DROP TABLE cursor_tmp;
CREATE TEMP TABLE cursor_tmp (
geom GEOMETRY(POLYGON, 4326),
admin BOOLEAN,
touch INTEGER,
clip_id INTEGER
);

DO $$
DECLARE
_geom GEOMETRY(LINESTRING, 4326);
_curs CURSOR FOR SELECT geom FROM land_split_blade;

BEGIN
OPEN _curs;
LOOP
FETCH
_curs INTO _geom;
EXIT WHEN NOT
FOUND;

TRUNCATE TABLE cursor_tmp;

INSERT INTO cursor_tmp
SELECT
(ST_Dump(
ST_CollectionExtract(
ST_MakeValid(ST_Split(geom, _geom))
, 3))).geom :: GEOMETRY(POLYGON, 4326),
admin,
touch,
clip_id
FROM land_split_
WHERE ST_Intersects(_geom, geom) AND admin IS FALSE;

DELETE FROM land_split_ WHERE ST_Intersects(_geom, geom) AND admin IS FALSE;

INSERT INTO land_split_ SELECT
* FROM cursor_tmp;

END LOOP;
CLOSE _curs;
END $$
;
land_split_ (green)

We create point geometries on land_split_ using ST_PointOnSurface(). Using a query which orders the filtered pt_split geometries by ST_Distance() we can assign the glx from neighbouring geometries. Grouping the land_split_ polygons by their glx identifier allows us to generate the pt_union layer.

ALTER TABLE land_split_
ADD COLUMN qid SERIAL NOT NULL PRIMARY KEY;

ALTER TABLE land_split_ ADD COLUMN glx VARCHAR;

ALTER TABLE land_split_
ADD COLUMN geomcntr GEOMETRY(point, 4326);

UPDATE land_split_ SET geomcntr = ST_PointOnSurface(geom);

CREATE INDEX sidx_land_split__geomcntr
ON land_split_ USING GIST (geomcntr);

WITH T AS (
SELECT
qid
,
(
SELECT glx
FROM
pt_split
WHERE water IS FALSE
ORDER BY
ST_Distance(land_split_.geomcntr, pt_split.geom)
LIMIT 1
)
FROM land_split_
WHERE admin IS FALSE
)
UPDATE land_split_
SET glx = T.glx
FROM
T
WHERE land_split_.qid = T.qid;

UPDATE land_split_
SET glx = pt_split.glx
FROM
pt_split
WHERE st_dwithin(pt_split.geom, land_split_.geomcntr, 0)
AND pt_split.water IS FALSE;

SELECT
glx
,
ST_CollectionExtract(
ST_MakeValid(st_union(geom))
, 3) geom
INTO pt_union
FROM land_split_
GROUP BY glx;

CREATE INDEX sidx_lpt_union_geom
ON pt_union USING GIST (geom);
pt_union

Let’s finally update the geometries in our original pt table which holds the census geometries.

UPDATE pt
SET geom = st_multi(pt_union.geom)
FROM pt_union
WHERE pt.glx = pt_union.glx;
Seamless administrative boundaries!

I can imagine that there are some edge cases where this process might fail. Please let me know if you find such cases. The whole process could also be optimised and put into a stored procedure.

Building on this I would also like to look at matching polygons to OSM admin boundaries in the future.

Please check out GEOLYTIX website if you are interested in this kind of work. We provide a range of clean open data as well as commercial data sets for location analytics.