DATA STORIES | GEOGRAPHIC ROUTING | KNIME ANALYTICS PLATFORM

Bus Route Optimization in KNIME

Make student mobility more efficient by reducing average route time and total mileage

Ricardo Auerbach
Low Code for Data Science

--

Photo by Thomas Park on Unsplash.

Use Case: Improve student mobility

Lincoln Parish School Board oversees bus transportation for about 3000 students across 7 schools covering grades K-12. Bus routes, in their current state, are at the discretion of each bus driver, they have established routes through experience and familiarity with the area. Drivers are expected to have routes established by the first day of school and gain familiarity with routes over time.

Challenge: Cope with drivers’ discretion to chose bus route

The main problem with the current system is variability among routes. Some drivers will stop in front of each student’s house, but there are several drivers that have established stops to unload multiple students at a time. There are regulations for how far from a student’s house a bus can unload. Drivers are aware of this rule and a majority choose to abide by stopping in front of every house. Once a stop is established, it cannot be changed for the remainder of the school year, according to the Lincoln Parish School Board bylaws.

Goal: Reduce route time and distance travelled

Our objective was to develop a system that could create concentrated bus stops in accordance with current regulations and find the shortest distance from stop to stop to limit the amount of time a student spends on the bus. We collected data from 2021–2022 bus routes to use as a control group as well as develop a method for generating more efficient bus routes. This data included forms filled out by the bus drivers indicating bus stop locations, the time they arrived at each stop, number of students, and total route time. We used these real-world data points to determine the coefficient of time assigned to each student boarding the bus to ensure that our generated routes would be replicable in the real world.

A no-code solution with KNIME

To tackle our challenge and come up with a solution that makes bus routing more efficient, we relied on KNIME Analytics Platform — the free and open source software for data science.

We developed two KNIME workflows operating in the following steps:

1. Generate coordinates from students’ street addresses by querying a Google Maps API.

2. Generate bus stop locations that are within 200 yards of the student they are assigned to. To do that, we opted for the k-Means clustering algorithm using the dedicated node.

3. Test for fastest route through bus stops using a “Drivetime and Distance Query” component with Tom-Tom API and a parameter optimization loop to shuffle the order n-times and store the fastest order.

4. Recall the fastest route and save it to an Excel file.

Workflow 1: Bus Stop Generator

This workflow inputs student addresses and uses the “k-Means” clustering algorithm to generate bus stop locations within walking distance of the students. Increasing the number of clusters generated by the node results in shorter walking distances for the student.

Fig. 1. This workflow determines adequate bus stop locations based on proximity to student addresses.

One limitation we encountered was that k-Means operates via Euclidean distance, or “as the crow flies.” Manual input is required if the model generates a stop where a bus cannot reach, for example a property line shared by two students on two adjacent streets. In this case, we ran the model without those students and created stops manually for them.

Fig. 2. A comparison between the map view of students’ addresses, and their respective bus stops.

The result is an Excel list of addresses for each student, a column with the coordinates of each student’s bus stop, and a column for the total walking distance for each student. Fig. 2. provides a comparison between the map view of students’ addresses, and their respective bus-stops.

Workflow 2: Bus Route Optimizer

This workflow takes the bus-stops generated by the first workflow and generates the faster route through the stops by using a parameter optimization loop and a “Drivetime and Distance Query” component that queries the Tom-Tom API.

Fig. 3. This workflow performs bus route optimization.

Data Preparation

First, we import the list of bus-stop addresses and convert to coordinates using the Google Address Geocoder node. Some nodes did not work well with/without comma separated coordinates, so there are a lot of instances where we had to convert between the two in this workflow.

Fig. 4. Data cleaning workflow to convert addresses to coordinates (lat/lon).

In a parallel branch, we import the Excel file of bus-stop locations based on coordinates and include a Map Viewer node to check that all coordinates were generated correctly. We also filter out duplicate addresses — they indicate multiple students FROM one house that ride that bus from that stop. We conducted a regression analysis that showed a negligible impact on total route time when multiple students boarded the bus from one address.

Fig. 5. Filtering duplicate addresses and viewing bus stop location on a map.

Note. The geospatial nodes used in this project are third-party nodes. As such, KNIME cannot guarantee smooth execution and/or careful maintenance. KNIME’s recommended nodes for geospatial analysis are available in the Geospatial Analytics Extension for KNIME, co-developed with the Center for Geographic Analysis at Harvard University.

Route Optimization Loop

After the data preparation step, we can move on to calculate the total route times and find the fastest possible route.

Fig. 6. This is the part of the workflow where we calculated the total route times and shuffle the order until the fastest possible route is identified and then stored.

We start by using the Parameter Optimization Loop Start node to generate random seeds for the Shuffle node. We increase or decrease the number of steps in the loop depending on the number of proposed stops, since a route with less stops will require less attempts to find the fastest route, as opposed to a route with more proposed stops. After shuffling we append the coordinates columns onto themselves starting with the second row, and removing the last row. This prepares the coordinates for the “Drivetime and Distance Query” component which uses a Tom-Tom API to generate route data for start and end points across all the rows in the route as shown below:

Fig. 7. Results from “Drivetime and Distance Query” component.

After summing the total route time and multiplying against the coefficient determined by our regression analysis we finish the loop with two outputs, a list of all random route times with their corresponding seed, and the fastest route time along with the seed that generated that route.

Fig. 8. Outputs of the Parameter Optimization Loop End. The best parameter random seed will be converted into a flow variable and used in the final portion of the workflow.

After determining the fastest randomly obtained route order, we extract the best random seed and use it to shuffle the stops into the fastest order found in the parameter optimization loop. We calculate the route data one last time in a series of nodes very similar to that found in the loop.

Fig. 9. By using the most optimal random seed we can insert it as a flow variable into the “Shuffle” node to get the exact same route order again.
Fig. 10. After generating the route data the total time and distance is written to an Excel file.

Summary of Results

We used KNIME Analytics Platform to develop concentrated bus stops and perform route optimization to reduce the average route time from 49 to 29.5 minutes and the total mileage reduced from 430 to 210 miles. Our goal was to have the average time on the bus under 30 minutes and to reduce fuel expenses by 10–15%. We succeeded in reducing the time and, by limiting total mileage, lowered the total fuel cost by 51%. The pilot test was run entirely through KNIME Software due to rules regarding altering established bus routes within a school year. The changes made to routes can only be made for the next academic school year, 2023–2024, if agreed upon by the elected school board.

We accomplished this as part of a training program to earn our Six Sigma Black Belt Certifications that pairs groups of students with outside organizations with the goal of implementing what we have learned in our curriculum in the real world, while producing performance improvements for the partnering organization. This took place within a two-month period, with no funding, and alongside our full-time school schedule. With more time and funding, we believe that more could be achieved with regards to optimizing Lincoln Parish School Board’s Bus Routes.

--

--