Interpretable vs Powerful Predictive Models : Why We Need Them Both
I recently came across an interesting reading, and stumbled upon this kind of mind-blowing statement that makes you think :
Unfortunately, the predictive models that are most powerful are usually the least interpretable.
I fully agree with this idea, all the more so as I personnaly experienced this, during many DataScience.net, or Kaggle Data Science challenges : best models were often blends of several predictors, making them complex and lacking of explicability. This is in fact a general observation : “most of the prize-winning models are ensembles of multiple models”.
Until just a few months ago, and as a Kaggle-born data scientist, I thought the game was played, between interpetability and performance — model explicability was not worth wasting time, as long as the predictive power was there. I must admit that my opinion has now evolved, and things are actually more subtle than that.
We live in a real world (not a Kaggle world)
As suggested Julia Evans :
Machine learning isn’t Kaggle competitions
This is indeed what I’m experiencing daily, now that I’m practicing data science not only on virtual cases, but also at work.
In real organizations, people need dead simple story-telling — Which features are you using ? How your algorithms work ? What is your strategy ? etc. … If your models are not parsimonious enough, you risk to lose the audience confidence. Convincing stackeholders is a key driver for success, and people trust what they understand.
What’s more, at the end of the day, the ultimate goal of the data science work is to put a model in production. If your model is too complicated, this will turn out to be impossible or, at least, very difficult. John Foreman pointed out :
What’s better : A simple model that’s used, updated, and kept running ? Or a complex model that works when you babysit it but the moment you move on to another problem no one knows what the hell it’s doing
Even Netflix faced this issue, and admitted :
Additional accuracy gains that we measured did not seem to justify the engineering effort needed to bring the complex models into a production environment
So what do we need ? Interpretable, or powerful models ?
Actually, we need them both !
Powerful models are useful, to gauge where the state-of-the-art limits are. Even if they are not suited for production, they help us to set a performance benchmark. They stimulate the data science team, and train the scientists to use top-notch machine learning technics. If you want to attract the best talents, and build a competitive advantage, you must make them work on complex problems, and develop a creative environment. Else there is a risk you lose, or sterilize the team.
But on the other hand, you have to remain pragmatic, as discussed above. Never forget that :
Whether or not you increase complexity for additional accuracy is not a data science decision. It’s a business decision
and also that :
The underlying problem is not ‘which model do we choose’ but ‘what action do we take’
Finally to wrap-up, and conclude, let’s say :
Complex models are to Data Science, what “haute couture” is to the clothing industry : they are not made for daily use, but are necessary