Secret for a simple yet elegant matching algorithm made from scratch
Operational Research and optimization with Python
Matching customer needs with the most relevant suppliers is crucial in many industries. This is particularly true for platforms which put in touch independent workers with clients. Companies like Uber or Lyft put in a lot of efforts to efficiently match drivers with riders. This is key for user satisfaction on both sides.
But what is the secret behind such a complex problem ? Let me answer this question by presenting you a simple yet elegant solution I have implemented from scratch.
Keep scalability in mind
When it comes to matching, it all goes down to optimization. The main goal of any matching algorithm will be to find the best combination leading to the overall lowest cost, highest client value or smallest environmental impact.
To make it simple, you would say, let’s try every possible permutations of client/driver duos and keep the cheapest one. In essence that is the brute force idea. But the total number of permutation to asses increases steeply. If you start with 5 drivers for 2 clients requests, you have 20 permutations. If your business grow by 10 folds (50 drivers to be match with 20 clients), you end up with more than 10³² permutations ! Definitely not the definition of scalability.
So let’s be more clever … and that is where you open the door to an amazing, but not renowned, field of mathematics and data: Operation Research or Combinatorial optimization.
Application to car delivery service
Nowadays, you can buy anything online, even your brand new car. Car manufacturers (such as Renault, Tesla…) or car dealers (such as Aramis Auto, Auto1, and all manufacturers retailers) are dematerializing their selling client journey.
They rely on car delivery companies such as hiflow.com which are offering a premium delivery service where the car is driven by an independent professional driver from store location to the end-client.
As any logistics company, building a more intelligent network leading to cost optimization is fundamental. The cost of such car transfer can be broken-down to :
- Driver transportation costs from home to car pick-up point (mostly public transport)
- Driving costs (petrol, toll, insurance, …)
- Driver transportation costs from car delivery point back to home (mostly public transport)
- Driver’s contract price proposed for the overall trip
Our main optimization goal will be to reduce overall transportation costs by performing an intelligent matching. The easiest solution would be to match the drivers the closest to pick-up points for each car transfer request. But don’t forget many local optimums do not always lead to global optimum.
Let’s build our optimization model
It is time to get our hands dirty and do some math to define our optimization model. Let’s lay down a few essential parameters :
Our goal will be to minimize the overall cost :
Without any additional constraints, we will end up with a trivial solution where no matching is proposed (all x = 0). We need to add a minimum number of matches for each transfer requested (we don’t want any client to be left behind) :
Finally, we need to limit the number of matches proposed to each driver otherwise at some point it will become irrelevant to receive dozens of job propositions :
Theses constraints can be adjusted depending on the situation of each driver or transfer. For example, a transfer located in the middle of nowhere, should be proposed several times to find a match. Knowing this, you can raise the minimum number of matches proposed for this particular transfer.
Define what drives your cost
This Cij cost matrix is key for our model. Building a comprehensive definition of it will lead to a successful matching.
The primary criteria which will be used is the distance between driver’s home and transfer pick-up point. This is the criteria most correlated to transfer direct costs.
There are several ways to compute this estimation (distances are as the crow flies, public transport network nodes, accurate multi-modal transportation costs, machine-learning regression). Improving this cost estimation is a never-ending challenge. The secret recipe you will use mostly depends on what you want to improve. In the peer-to-peer ridesharing example, that will be the client waiting time for a driver to reach him/her.
Boost certain matches
Cost estimation is a great base, but sometimes, you want to boost certain matches. In the ridesharing model, you might want to boost riders or clients with good reviews. The simplest way to do it is to reduce the Cij cost estimation. It is like applying a discount on an item which will boost its sales. Here the matching algorithm will boost the selection of cheaper matches. The simple solution is to apply a boost coefficient (between 0 and 1) on the cost base.
In our case, to reduce driver churn, we decided to boost those who have done few transfers in the last couple of weeks. So for each of them we define a coefficient (Bi) which is applied to the cost estimation as such :
Boost for a given transfer or for a specific driver/transfer duo can also be applied. At the end of the day, this is a matching proposition, drivers remain always in control on which transfer they would like to make an offer.
Define your constraints
On the other hand, there might be some incompatible matches. For example, if your preference in a dating app goes toward women, it will not match you with a man.
In our case, we have the same problematic. Some drivers which are not certified for a given client transfer process will not be matched accordingly. Thus, an exclusion matrix id defined and a new constraint is added to our model :
Time to solve the model
The mathematical problem we have laid is called a Mixed-Integer Linear Programming (MILP). You can find a lot of different tools to cleverly find an optimum cost and avoid the 10³² brute force testing.
The most accessible, and one that you may know is the Excel Solver. This is basic but can help you quickly solve a problem without any line of code. Obviously, this is not an option applicable in production.
The easiest solution we found to solve this model in a reliable manner is by using a Python programming language based library named PuLP. With a few lines of code, PuLP allows you to generate any Linear Programming model. Once the model generated, you can select among many different solvers (such as Open Source GLPK or CBC, or a paid one like Gurobi) to solve it.
I will go into the details of the code programming of these solvers and how to use them in production in a following article. In the meantime, I recommend this article :
This simple model from Operational Research can be a very relevant approach in many cases. It is simple to implement and very flexible. It has been recently implemented at Hiflow and we are expecting significant performance improvements by reducing the time spent by our team to find the best driver/transfer match.
Anytime you are facing a matching problem, you should consider this solution, whether for ridesharing or dating.
To know more about Uber matching :