Designing impactful algorithms
We’ve tossed out Virtuo’s general approach to data science in this article a few months ago. Today, we’ll detail how does it translate into the design of algorithms, with simple principles.
The first question we ask ourselves when we think of creating an algorithm is about the type of phenomena we are trying to predict or optimize. We need to make sure the phenomenon the algorithm is going to deal with respect the following principles.
1. Only analyze and predict phenomena you can leverage on
Let’s start with an example: if Virtuo owns the fleet, predicting the demand can be useful because we can then optimize the fleet sizing according to it. The time and resources we spend on demand prediction algorithms is justified by the concrete and significant actions (anticipated and quantified decisions) we can make thanks to those predictions.
Before all else, we need to identify our action levers: can the fleet vary? By how much? When? That’ll be the adjusting variable of our fleet optimisation algorithm. We take into account operational constraints in purchasing and reselling cars, take as an input a demand prediction, and then maximize our operational margin by adjusting the fleet plan.
Similarly, we’ve put in place a robust tracking of searches and quotes on the app. It made sense because we control most of our distribution, hence do not suffer from selection bias, but first and foremost, we control our price policy, then we can optimize our prices according to customer price elasticity we measure.
2. Only predict phenomena you already monitor well
When we try to predict or optimize anything, we need to know it well.
In practice, it means two things:
- The data scientists developing the algorithms are the ones who made the feature engineering of the input data beforehand. If they know well the time series of the performance indicators they’re dealing with, the orders of magnitude, seasonality as well as implicit calculation conventions, etc., they can predict and make prescriptions to optimize them.
- The operational teams to which the results of the algorithms are given (for instance, logistics and fleet for the fleet algorithm) will use them if and if only they already monitor the same figures on a day-to-day basis, and think these figures are trustworthy and relevant.
Most of the added value of data is found first in past and present data analysis. Devote energy to predict the same metrics in the future only bring an increment of added value. Sometimes, this step ahead is a game changer, and we hope it’s the case most of the times, but in all cases we need a robust pillar of data analysis on which we can eventually build prediction and optimization tools.
As a result, when we work on feature engineering for preprocessing the data before training models on it, it’s likely that the feature engineered data can also be analyzed as such : we make it available in our business intelligence tools instead of keeping it isolated in an algorithm.
When we have a phenomenon respecting the two principles mentioned above and want to create an algorithm, we have set two more principles.
3. Focus on differentiating features, only adds complexity if necessary
The following assertions may be trivial or tautologic, but it’s sometimes necessary to make such trivial takes: we need algorithms to be appropriate with the issues we want to solve. We must design in the most simple and transparent fashion, by the sole purpose of keeping a low effort/value added ratio. We must also make them adapted to the nature of the problem and to the dimensions of the data (lines, columns).
Most of the effort must be put into what cannot be automatized or externalized (not in existing libraries): tailor made cleaning, feature engineering, defining and modelling cost functions we want to optimize.
For instance, we have developed a prototype algorithm predicting the damage risk of our users. Instead of using a standard f-score (harmonic average of precision and recall, the standardized error measure in most biased classification problems), we use a weighted harmonic mean of the recall (how many damages could be avoided) and the positivity rate (how many users are predicted as risky, whether rightly or wrongly, which is directly link to the shortfall we’ll experience if we’d deter them to rent at Virtuo), and we weigh those two measures thanks to the relative costs of damages and to shortfall in lifetime value.
Similarly, in another algorithm, where we prescript marketing expenditures, we model the impact of expenditure on customer acquisition, and we penalize complex recommendations, namely recommendations with huge variations from one week to another.
Finally, in our fleet sizing algorithm, most of the effort is put into modelling the utility function: the operational margin stemming from a predicted demand and a fleet plan to optimize.
4. Algorithm results are not useful if they cannot be analyzed and explained
When it comes to recommendation algorithms driving business decisions, a good algorithm is an interpretable algorithm. Not only its outputs must be auditable, but so should its explanatory factors or drivers. When the algorithm is a linear regression — the most simple machine learning algorithm — , let’s once again state the obvious : we must collect the results and the most significative factors. The same shall apply to more complex algorithms.
In all of our algorithms, in particular in prediction algorithms, we save both the predictors themselves (theta for regressions, perceptrons and applicable thresholds for classification algorithms) and the results, as well as confidence intervals for the results.
We also run sensitivity analyses, as well as counterfactual scenarios. When we optimize the fleet or the expenditure, we don’t compare the resulting optimized operational margin to the actual one, but to a simulation of reality, in order to quantify/isolate precisely the added value of the optimization. We also run uncertainty models and post mortems: optimizing the past based on what actually happened, based on what we predicted at that time, in order to distinguish room for optimization from prediction error.
Finally, we archive every result, even non significative ones, to compare results run after run, understand better how the algorithms are sensitive to new data or changes in its assumptions.
Those principles help us avoid some classic data scientist mistakes and stay focused on the impact of our algorithms on business.
Alexandre Journo, Head of Data Science Virtuo - The New Generation of Car Rental
Alexandre Dubourg, Data Scientist Virtuo - The New Generation of Car Rental