# Modelling the Public Transport Network — Part II

*Yuanruo Liang, Software EngineerNg Wing Yiu, Associate Data Scientist*

*This article is about the data science algorithms behind Reroute, the bus planning simulator built by GovTech Open Data Products to improve the public transport system. If you haven’t, check out Part I **here**.*

Transport models are a systematic representation of complex real-world choices. There are plenty of methods in the literature to represent and model a public transport network, such as agent-based models, the classical four-step model, and graph models.

Our team decided to use a directed graph model, as it conferred significant advantages over other methods for modelling transport network changes:

**Data as a resource **We can build, simulate and validate the graph model using copious amounts of granular, real-world farecard data.

**Ease of modelling **Unlike agent-based models which is difficult to tune, interpret & converge, graph models are easily interpreted by setting edge weights to the expected travel duration between two nodes.

**Easy to specify **Each state and transition in a commuter’s journey can be modelled by the inclusion of the corresponding vertex or edge type. It is also easy to specify different networks for different time periods of the day.

After careful consideration, we decided to model the public transport network as a graph consisting of six node types — entrance, exit, bus stop, fare gate, on the bus, and on the MRT. The entrance and exit nodes are special marker nodes which serve as the start or end of a path, and allows Reroute to consider nearby bus stops and MRT stations when conducting a path search.

An example schematic showing a commuter traversing each part of the network is shown below in Figure 1.

The state transitions on the transport network can be summarised in Table 1:

This approach generates a network consisting of ~33,000 nodes and ~84,000 edges.

To estimate the travel times, we used farecard data and optimisation techniques to estimate edge weights between bus stops and MRT stations. For example, the travel times between different MRT stations can be solved via Non-negative Least Squares (*NNLS*):

where

is a Boolean matrix of *m* trips traversing over *n* edges, *x* is the vector of coefficients for each edge weight to be interpreted as the travel time, and *y* is the observed travel time from farecard data.

To be more realistic, we also incorporated the following into the model:

**Simulate Peak Periods **Travel time on a vehicle varies depending on if there is congestion, whether it is a weekend and many other factors. We modelled on-vehicle edges based on morning peak hours to simulate worst-case scenarios.

**Boarding & Alighting Time** Waiting for the bus or getting off the train takes time too, and we also used *NNLS* to estimate the time taken. We used bus headway data to model the expected waiting times for each service.

**First and Last Mile **Farecard data does not capture the start and end of journeys. To be accurate, we relied extensively on the domain knowledge and on-ground experience of bus planners to estimate realistic weights for walking edges. Incorporating walking helps to model not only transfers but also commuter’s preferences.

# Efficiently Routing Commuter Choices

To compare commuter travel patterns, Reroute first creates an unmodified transport graph, then applies the specified simulation plan to create a second, modified graph. Commuter trips are routed through both graphs and the results compared to determine commuter impact.

However, there is a snag — in real life, commuters have *combinatorial* route choices when travelling, each of which has some variance in travelling time. We needed to model the variance in choice in the aggregate while retaining the principle that commuters are rational in their individual preferences. To accomplish this, each journey is routed through the graph not only once for the shortest path, but for *K* shortest paths instead.

The value of *K* is estimated with the diversity index, determined by weighing every unique combination of route choices for the same origin and destination.

Additionally, given the size of our graph (*33,000 nodes, 84,000 edges*), using Yen’s algorithm from the popular NetworkX library on a single bus route proved to be computationally challenging, with the runtime exceeding 8 hours even on a small Spark cluster. After investigating further, we noted that the algorithm is unable to reuse results from previous calls to Dijkstra’s algorithm.

To speed up our calculations, we implemented a deviation path algorithm by Martins, Pascoals and Santos, which we have open-sourced here. Unlike Yen’s algorithm, the MPS algorithm does not guarantee a lower bound on algorithmic complexity — but is able to cache intermediate results, which in practice had the effect of speeding up calculations by more than **100x**. This enabled the simulation to run on just a single machine — demonstrating again that a solid algorithm always beats throwing expensive hardware at a problem.

# Summary

In this article, we shared a method to model the public transport network realistically by constructing a weighted graph using farecard data to estimate the impact of modifying bus routes.

*If you’ve enjoyed this article, do check out Part III on the engineering behind Reroute **here**.*

*The Open Data Products team aims to combine data science with software engineering to deliver analytics solutions for the public good. If you’re interested to join us, click **here** to apply.*

# Acknowledgments

*A.I. Platforms, Government Technology Agency*

*Chong Hon Fah, Senior Data Scientist*

*Open Data Products, Government Technology Agency*

*Laura Lee, Associate Data Scientist*