Improving Operations with Route Optimization

Contributors: Feiko Lai, Michal Szczecinski, Winnie So, Miguel Fernandez

Kamil Bujel
GOGOX Technology
Published in
10 min readJun 21, 2018


Please find our academic paper on the topic here.

Every day, GOGOVAN drivers arrive at warehouses across Asia to pick up thousands of orders that our business partners have asked us to deliver to their clients.

These orders can be a range of things — from that long-awaited new phone, to an anniversary present that was ordered last minute. All of them will be of different sizes, shapes and weights. For each one of them, there will be a person waiting and hoping this time the courier company gets there on time…

This is why at GOGOVAN, we do everything in our power to ensure smooth and timely deliveries with quality of service that will amaze our customers. Every delivery route is carefully planned manually and double-checked by our Operations Team, to make sure we never fail.



Yes, that is correct.

In the past, Operations Team would have to manually sort out the delivery routes, usually on the morning of pickup, and ensure we meet all the delivery time requirements for that day. As you might imagine, that is not a particularly exciting or easy task :)

It took one person approximately 1 hour to create a sub-optimal route for 100 waypoints. For requests bigger than that, this time grew exponentially.

We instantly realised that this process was simply begging for some automation.

Not only we felt sorry for the Operations Team who had to do such mundane work each early morning, but we also knew that as our order volume grew, that task would slowly become mission impossible. We saw it as an opportunity to develop a cutting-edge technology that will become a core component of GOGOVAN’s Data Science stack.

How did we start?

We are very client and driver-centric. Consequently, we always try first to analyse a problem from their perspectives in order to understand how our solution could impact and benefit them. After a lot of brainstorming, these are the goals we came up with:

  • All orders need to be delivered on time.
  • Ensure drivers are not rushed to make it on time by using buffer times and real-time distance.
  • Save fuel by reducing the distance driven.
  • Minimise idle time for drivers — no one likes waiting with a trunk full of packages.
  • Improve vehicle utilisation.
  • Fully automate the process.
  • The algorithm needs to be able to grow with us — supporting different types of deliveries, vehicles and countries.

Having determined our main objectives, we decided to explore the world of academia and open source — there’s no point in rediscovering the wheel. We realised that the problem we faced was widely known as the Vehicle Routing Problem.

What is Vehicle Routing Problem?

Vehicle Routing Problem (VRP) can be described as the problem of creating a set of optimal routes from one, or many, depots to multiple customers, subject to a set of constraints. The objective is to deliver goods to all customers, at the same time minimising for the cost of the routes and the number of vehicles.

This problem is NP-hard, as proven by Lenstra and Rinnooy Kan¹. However, there are still some exact solution methods, using a branch-and-bound², or dynamic programming³, however, as described in the papers above, they only seem to be working for up to 150 waypoints.

Currently, the state-of-art solutions are obtained using the metaheuristics: Genetic Algorithms⁴, Tabu Search⁵ and Ant Colony Optimization⁶. These are the methods mainly used in the field nowadays.

For in-depth review of the field, we recommend this brilliant thesis⁷ by Xiaoyan Li.

Our solution

With VRP being a widely recognised problem, there are indeed a lot of companies out there who seem to be tackling the problem.

However, we somehow did not feel satisfied with their solutions…

We just knew that if we combined our operations know-how, data science and research expertise, large volumes of data, and open source state-of-the-art contributions, we can arrive at a robust in-house solution that:

  • Is more up-to-date, performant and has customisable algorithms and iteration logic.
  • Is cheaper, more efficient and more scalable.
  • Allows developing tangible intellectual property assets and building competitive advantage around it.
  • Allows us to guarantee to our clients that their delivery data will go no further than GOGOVAN.

Having thought about it for quite a time, we decided that if we want to be the leaders in the field, we need to do it our way, not use some blackbox solution.

So we got going…

First algorithm

But we didn’t dive into academic papers straightaway!

First, we focused on coming up with our own approaches, and evaluating them. This process allowed us to get a better understanding of the field from the very beginning and experience some of the common problems for ourselves (we didn’t take anything for granted!). What’s more, this helped us later on, as we could easily see pros and cons of different methods presented in academic papers and come up with strategies to combine them.

Such a process was the key to understanding straightaway how this brilliant package — Google Optimization Tools — worked. We understood that the folks at Google saved us months that we would spend coding all the different algorithms that we wanted to test. They allowed us to get down to the most interesting part immediately!

We spent a lot of time playing with that library, testing different scenarios and seeing for ourselves which strategies worked when.

We liked it so much we decided to build our tool around it!

It had all we needed — the transparency, the ability to experiment, the flexibility and the support.

The first algorithm was ready. We deployed it.

Figure 1: Visualisation of one of our first Route Optimization assignments

Rapid growth — how to handle more orders?

The first version of Route Optimization turned out to be a great success.

The volume of orders submitted to Route Optimizer quickly increased from 500 items per warehouse to 1000+. Theoretically, we should be fine.

But we were not.

Our algorithm runtimes and memory usage jumped incredibly quickly — from 1 minute and 500 MB to 10 minutes and 5 GB. As we tested it for higher and higher volumes, we finally reached the maximum — for 2000 waypoints the module used up 25GB of RAM memory.

It was unacceptable.

Basically, we had two options:

  • Build a completely new Route Optimizer that would be able to support such large volumes by default
  • Create a new algorithm that will run on top of the current implementation — probably a method that will combine orders in smaller batches that would be then submitted to the main optimisation algorithm

As we are pragmatic (and also love building on top of the great work we have already done), we decided to proceed with the second option.

How do we create smaller batches?

Let’s start with a renown clustering algorithm — DBSCAN⁸.

What we have is a state-of-the-art method that groups together geographical points. However, it has its downside: each cluster has to have the same radius.

This is not what we want for one simple reason: one cluster of 1km radius in Tsim Sha Tsui may contain 1000 orders, while other 1km clusters in Pok Fu Lam and Waterfall Bay may each contain only 3 orders.

These clusters would be really inefficient and uneven…

In Tsim Sha Tsui, the cluster size would be too big — we would have too many orders in one cluster. But at the same time, in some other regions this radius would not be enough — the clusters would be too small and the orders that are relatively close together would be in separate clusters.

That is why we decided to use a modified approach — called “recursive-DBSCAN”.


It builds on the brilliancy of DBSCAN, but at the same time allows us to dig in deeper into high waypoint-density regions, while grouping remote orders together.

For a list of orders, we aim to find the radius for which the average number of waypoints will be the biggest (but the number of clusters will be higher than min_no_clusters). We do so by using a simple binary search algorithm.

Once we have found the optimal solution we “enter” the clusters that are too big and apply the same logic until we reach a point when each cluster contains less than max_len_cluster.

Then, for each cluster, we run Route Optimization algorithm we have developed using Google Optimization Tools. Hopefully, this will give us a similar result more quickly, and using less RAM memory.

The pseudocode is as follows:


We were very curious about how our method would perform, but at the same time we were worried that the recursion might run for very long, consequently making our algorithm no better than the baseline method.

That is why we first decided to have a look at the runtimes:

It turned out that the recursive-dbscan algorithm greatly outperformed the Google Optimization Tools method. At the same time it did not not differ much from the runtimes of the dbscan method.

We were only able to run dbscan for maximum of 2000 orders and Google Optimization tools for 1500 orders due to the RAM memory usage issue: both methods crushed when the memory required exceeded 25 GB.

Runtimes are important, but what we were also interested at is how our new algorithm performs in terms of total distance and the number of vehicles used, as compared to the baseline method. These two graphs show that:

As we see, the recursive method closely follows the Google Optimization Tools method in terms of both distance and number of vehicles. At the same time, it outperforms the dbscan method.

It means that our new algorithm is quicker than the baseline and the quality of solutions found is as good. Also, maximum RAM used is “only” 1GB!

We got it!

DBSCAN vs Recursive-DBSCAN

We also wanted to show you how exactly recursive-dbscan works better for remote waypoints.

Figure 5: Comparison od DBSCAN and Recursive-DBSCAN routes. We are aware that they are still not perfect!

Above, on the left-hand side we have a map showing an assignment found using normal DBSCAN algorithm. We can see that many of the drivers only deliver one order — as these orders are the only ones in their batches.

On the right-hand side we see that the recursive method handles this issue quite well, by using different radii for different regions, it manages to to find a solution that delivers all orders only using 3 vehicles!

This is a perfect visualization of how the recursive-dbscan method is better for our use case and why we chose to use it.

Conclusion (aka TL;DR)

In this article, we have presented our approach to Capacitated Vehicle Routing Problem with Time Windows for large number of waypoints (up to 5000). By using a recursive-dbscan method we were able to significantly reduce runtimes and memory usage, while maintaining similar quality of results as in the baseline Google Optimization Tools method.

This algorithm is of great help to our Operations team, reducing hours of mundane manual work to a few minutes of CPU time (and double-checking the results by a human).

The future

We are aware that our tool is not perfect.

One of the main problems is that it is still a static method, that once run will not update itself if a better route becomes available, or if the road situation changes. We have a few options in this case, one being implementing geohashing like Lyft , or another one coming from our partner —Research Facility in Big Data Analytics at PolyU Hong Kong.

Our goal is to improve the Route Optimization so that it constantly monitors the drivers and sends alerts if the drivers are at risk of not making it on time with some parcels — all that to make our customers happier with our service.

Hopefully, this article has provided you with some good insight into the problems that we tackle at GOGOVAN. If that sounds interesting to you, or you are just interested in finding out more, please do not hesitate to get in touch.

There is naturally a lot ground for future improvement, but we hope to share some of our approaches in order to spark discussions and progress in a fascinating area of optimising on demand logistics operations.

If you want to find out more about our Data Team, please see our Head of Data’s article here.

We are always looking for top-notch Applied Operations and ML Research talent. Please get in touch if interested! (Onsite and remote)


[1] Lenstra, J. K. and Kan, A. H. (1981), Complexity of vehicle routing and scheduling problems. Networks, 11: 221–227. doi: 10.1002/net.3230110211

[2] Fukasawa, R., Longo, H., Lysgaard, J. et al. Math. Program. (2006) 106: 491.

[3] Baldacci, R. & Mingozzi, A. Math. Program. (2009) 120: 347.

[4] Nagata Y. (2007) Edge Assembly Crossover for the Capacitated Vehicle Routing Problem. In: Cotta C., van Hemert J. (eds) Evolutionary Computation in Combinatorial Optimization. EvoCOP 2007. Lecture Notes in Computer Science, vol 4446. Springer, Berlin, Heidelberg

[5] Bräysy, O. & Gendreau, M. Top (2002) 10: 211.

[6] Tan X., Zhuo X., Zhang J. (2006) Ant Colony System for Optimizing Vehicle Routing Problem with Time Windows (VRPTW). In: Huang DS., Li K., Irwin G.W. (eds) Computational Intelligence and Bioinformatics. ICIC 2006. Lecture Notes in Computer Science, vol 4115. Springer, Berlin, Heidelberg

[7] Xiaoyan Li (2015) Capacitated Vehicle Routing Problem with Time Windows: A Case Study on Pickup of Dietary Products in Nonprofit Organization

[8] M. Ester, H. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise,” in Proc. 2nd Int. Conf. Knowledge Discovery and Data Mining (KDD’96), 1996, pp. 226–231.