Modelling the Public Transport Network — Part II

Yuanruo Liang, Software Engineer
Ng Wing Yiu, Associate Data Scientist

This article is about the data science algorithms behind Reroute, the bus planning simulator built by GovTech Open Data Products to improve the public transport system. If you haven’t, check out Part I here.

Transport models are a systematic representation of complex real-world choices. There are plenty of methods in the literature to represent and model a public transport network, such as agent-based models, the classical four-step model, and graph models.

Our team decided to use a directed graph model, as it conferred significant advantages over other methods for modelling transport network changes:

Data as a resource We can build, simulate and validate the graph model using copious amounts of granular, real-world farecard data.

Ease of modelling Unlike agent-based models which is difficult to tune, interpret & converge, graph models are easily interpreted by setting edge weights to the expected travel duration between two nodes.

Easy to specify Each state and transition in a commuter’s journey can be modelled by the inclusion of the corresponding vertex or edge type. It is also easy to specify different networks for different time periods of the day.

After careful consideration, we decided to model the public transport network as a graph consisting of six node types — entrance, exit, bus stop, fare gate, on the bus, and on the MRT. The entrance and exit nodes are special marker nodes which serve as the start or end of a path, and allows Reroute to consider nearby bus stops and MRT stations when conducting a path search.

An example schematic showing a commuter traversing each part of the network is shown below in Figure 1.

Figure 1 An example schematic showing the nodes traversed on the network when travelling from bus stop 53269 to bus stop 43051.

The state transitions on the transport network can be summarised in Table 1:

Table 1 Summary of valid transitions in the graph. Each state transition is a type of edge in the network graph. Red nodes are for entry/exit, yellow nodes denote boarding/alighting vehicles and green nodes denote on-vehicle journeys. Bold indicates non-zero edge weights.

This approach generates a network consisting of ~33,000 nodes and ~84,000 edges.

To estimate the travel times, we used farecard data and optimisation techniques to estimate edge weights between bus stops and MRT stations. For example, the travel times between different MRT stations can be solved via Non-negative Least Squares (NNLS):


is a Boolean matrix of m trips traversing over n edges, x is the vector of coefficients for each edge weight to be interpreted as the travel time, and y is the observed travel time from farecard data.

To be more realistic, we also incorporated the following into the model:

Simulate Peak Periods Travel time on a vehicle varies depending on if there is congestion, whether it is a weekend and many other factors. We modelled on-vehicle edges based on morning peak hours to simulate worst-case scenarios.

Boarding & Alighting Time Waiting for the bus or getting off the train takes time too, and we also used NNLS to estimate the time taken. We used bus headway data to model the expected waiting times for each service.

First and Last Mile Farecard data does not capture the start and end of journeys. To be accurate, we relied extensively on the domain knowledge and on-ground experience of bus planners to estimate realistic weights for walking edges. Incorporating walking helps to model not only transfers but also commuter’s preferences.

Efficiently Routing Commuter Choices

To compare commuter travel patterns, Reroute first creates an unmodified transport graph, then applies the specified simulation plan to create a second, modified graph. Commuter trips are routed through both graphs and the results compared to determine commuter impact.

However, there is a snag — in real life, commuters have combinatorial route choices when travelling, each of which has some variance in travelling time. We needed to model the variance in choice in the aggregate while retaining the principle that commuters are rational in their individual preferences. To accomplish this, each journey is routed through the graph not only once for the shortest path, but for K shortest paths instead.

The value of K is estimated with the diversity index, determined by weighing every unique combination of route choices for the same origin and destination.

Additionally, given the size of our graph (33,000 nodes, 84,000 edges), using Yen’s algorithm from the popular NetworkX library on a single bus route proved to be computationally challenging, with the runtime exceeding 8 hours even on a small Spark cluster. After investigating further, we noted that the algorithm is unable to reuse results from previous calls to Dijkstra’s algorithm.

To speed up our calculations, we implemented a deviation path algorithm by Martins, Pascoals and Santos, which we have open-sourced here. Unlike Yen’s algorithm, the MPS algorithm does not guarantee a lower bound on algorithmic complexity — but is able to cache intermediate results, which in practice had the effect of speeding up calculations by more than 100x. This enabled the simulation to run on just a single machine — demonstrating again that a solid algorithm always beats throwing expensive hardware at a problem.


In this article, we shared a method to model the public transport network realistically by constructing a weighted graph using farecard data to estimate the impact of modifying bus routes.

If you’ve enjoyed this article, do check out Part III on the engineering behind Reroute here.

The Open Data Products team aims to combine data science with software engineering to deliver analytics solutions for the public good. If you’re interested to join us, click here to apply.


A.I. Platforms, Government Technology Agency
Chong Hon Fah, Senior Data Scientist

Open Data Products, Government Technology Agency
Laura Lee, Associate Data Scientist



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store