Pathfinding from HK Road Network Data

5 min readMay 2, 2022

Previously we have briefly explored HK road network data, this time I will demonstrate how to use Python lxml and NetworkX library to build simple pathfinding function. Given start and end point (nodes, from intersections data), suggest the shortest route.

Pathfinding solution visualized in Python Folium

At the very beginning, I had think of building algorithm like Dijkstra / A* from scratch, but “reinventing the wheel” is quite time-wasting, so I use Python NetworkX library to complete the pathfinding. The main task in this article is to convert GML into NetworkX-compatible graph model.

Syntax of NetworkX is very easy to understand, I will use the basic functions which similar to the following code:

import networkx as nx# Road is directional, so use directed graph
G = nx.DiGraph()# How to import the nodes with "key" and data
G.add_nodes_from([
    (4, {"color": "red"}),
    (5, {"color": "green"}),
])# How to import the edges with nodes "key" and data, including weight
G.add_edge(4, 5, weight=4.7 )# one-line shortest path function, it will return lists of nodes involved in the suggested path
sp = nx.shortest_path(G, 4, 5, weight='weight')

Note that the raw road data is not ready for import. For example, the add_edge function requires the start node and end node as argument, but the centerline (edge) data does not mention which intersection (node) it connected. We have to process and organize the data before import into the graph model.

The first step is import networkX and lxml library, and use lxml to parse the GML into XML tree.

from lxml.etree import XMLParser, parse
import networkx as nxDATA_DIRECTORY = ""i_path = f"{DATA_DIRECTORY}/INTERSECTION.gml"
cl_path = f"{DATA_DIRECTORY}/CENTERLINE.gml"parser = XMLParser(huge_tree=True)i_tree = parse(i_path, parser=parser)
cl_tree = parse(cl_path, parser=parser)# namespaces for easier xml data extraction
namespaces = {"core": "http://www.opengis.net/citygml/2.0",
    "gen": "http://www.opengis.net/citygml/generics/2.0",
             "gml": "http://www.opengis.net/gml"}

Then declare dictionary to store data.

# Dictionary to store intersections and centerlines by its primary key
intersections = {}
centerlines = {}# for each intersection save its rounded coordinates as key, primary key INT_ID as value
# e.g. key: "813766-826149", value: 10707 
intersections_by_coordinates = {}# for each centerline, save its start and end intersection ID as key, primary key ROUTE_ID as value
centerlines_by_intersection_id = {}

Although centerline and its intersections can be deduced by coordinates, I found that the value of coordinates are not numerically identical, direct “==” comparison is infeasible. After several testing, round the coordinates to its nearest integer is proper for matching. For example intersection point ID “10707”, its coordinates is (813766.015000001, 826148.584000003). I round and combine it into “813766–826149” and store as key in intersections_by_coordinates, and value is ID “10707”. This dict is saved for querying from centerlines.

NetworkX seems empathize nodes more than edges. In the pathfinding solution, it returns the nodes it pass through, not the edges it passes. However, the edges (road / road name) are indeed important for our solution. Therefore I additionally store centerline data in centerlines_by_intersection_id, the key is “{start_intersection_id}-{end_intersection_id}”, the value is centerline id. This dict will be used in pathfinding solution, to find the edge (road) data by nodes (intersections) ID.

def chunks(lst, n):
    """Yield successive n-sized chunks from lst."""
    for i in range(0, len(lst), n):
        yield lst[i:i + n]
        
def round_float_string(digit_in_string):
    """Convert a string float to a rounded string int"""
    return str(round(float(digit_in_string)))
    
def gml_tag_properties(tree):
    """ Extract string, int, double attributes inside a GML tree and return as dict.
    Extracted tags including: gen:stringAttribute, gen:intAttribute, gen:doubleAttribute
    """
    properties = {}    
    for t in tree.findall("./gen:stringAttribute", namespaces=namespaces):
        k = t.attrib['name']
        v = t.find("./gen:value", namespaces=namespaces).text
        properties[k] = v if v else None
    for t in tree.findall("./gen:intAttribute", namespaces=namespaces):
        k = t.attrib['name']
        v = t.find("./gen:value", namespaces=namespaces).text
        properties[k] = int(v) if v else None
    for t in tree.findall("./gen:doubleAttribute", namespaces=namespaces):
        k = t.attrib['name']
        v = t.find("./gen:value", namespaces=namespaces).text
        properties[k] = float(v) if v else None
    return properties

Above is the helper functions to convert data, used in the following iteration of intersections and centerlines.

# Parsing Intersections
for gco in i_tree.getroot().findall("./core:cityObjectMember/gen:GenericCityObject", namespaces=namespaces):
    properties = gml_tag_properties(gco)
    lod4Geometry = gco.find("./gen:lod4Geometry", namespaces=namespaces)
    point_tag = lod4Geometry.find(".//gml:pos",namespaces=namespaces)
    point_string = point_tag.text
    
    # Save rounded coordinates as another key of the intersection:
    # "834790.161200001 815110.907500003" -> "834790-815111"
    properties['point_string'] = "-".join([round_float_string(s) for s in point_string.split(" ")])
    
    # Save point coordinates in float list
    properties['coordinates'] = [float(s) for s in point_string.split(" ")]
    
    # Save the centerline ID into list
    properties["routes"] = []
    
    # for routes query its own start and end point by string-coordinates
    intersections_by_coordinates[ properties['point_string'] ] = properties['INT_ID']
    
    # For all non-None RD_ID_ connected to this intersection:
    for k,v in properties.items():
        if k.startswith("RD_ID_") and v:
            # append ID into "routes"
            properties["routes"].append(v)
    
    # save the intersection data in intersections
    intersections[properties['INT_ID']] = properties

Then we can import the data into directed graph:

# Instantiate NetworkX's directed graph
G = nx.DiGraph()# Import all nodes (i.e. intersections)
G.add_nodes_from(intersections.items())# Iterate all centerlines
for k,v in centerlines.items():
    # add edge into graph
    G.add_edge(v['start_at'], v['end_at'], data=v, length=v['SHAPE_Length'])
    
    # path_code = f"{v['start_at']}-{v['end_at']}"
    # paths[path_code] = v
    
    # if the centerline is bi-directional, add the edge of reversed direction
    if v['TRAVEL_DIRECTION'] == 1:
        G.add_edge(v['end_at'], v['start_at'], data=v, length=v['SHAPE_Length'])
    
    # if it is dead-end road with single direction, add the edge of opposite direction
    if v["TRAVEL_DIRECTION"] == 3 and len(intersections[v["end_at"]]['routes']) == 1:
        G.add_edge(v['end_at'], v['start_at'], data=v, length=v['SHAPE_Length'])

Finally we try pathfinding of two points: “24551” to “141311”, two intersection points in Kowloon City. And print the road name which the path pass through.

# from intersection "24551" to "141311", weight by edge (centerline) SHAPE_LENGTH (length)
sp = nx.shortest_path(G, '24551', '141311', weight='SHAPE_Length')# name of street which pass through
streets_pass_through = []# sp is a list of points (intersections) which the path pass through
for i in range(0, len(sp)-1):
    path_code = f"{sp[i]}-{sp[i+1]}"
    # for opposite direction of bi-directional centerline
    if path_code not in centerlines_by_intersection_id:
        path_code = f"{sp[i+1]}-{sp[i]}"route_id = centerlines_by_intersection_id[path_code]
    street_name = centerlines[route_id]['STREET_ENAME']
    if len(streets_pass_through) == 0 or streets_pass_through[-1] != street_name:
        streets_pass_through.append(street_name)
        print(street_name)

Result:

MA TAU CHUNG ROAD
HANG WAN ROAD
SUNG WONG TOI ROAD
SHING KAI ROAD
MUK ON STREET

Visualize the path in folium:

(Visualization in Folium will be introduced in another article)

Complete code for pathfinding

Please note that the usability of this pathfinding function is quite low, because “Restricted turn movement” data is not imported for pathfinding consideration, as restriction logic is too complicated for simple graph model to handle.

Pathfinding from HK Road Network Data

Written by Tommy