Occam’s Razor in ML practice
Occam’s razor in essence states that a model should be as simple as possible, but no simpler.
In this post we explore Occam’s razor principle in ML practice.
The two questions that we have to answer :
What do we mean when we say a model m1 is simpler than m2 ?
We can consider number of coefficients to be a proxy for model complexity. While comparing models one must consider number of learned coefficients. Lesser the simpler.
Based on the application, other metrics can also be considered : training / inference latency, memory required for training / inference etc.
How do we know if model m1 is better than m2 ?
Simpler has a better chance of being right. This is directionally correct as more complex models have a higher chance of overfitting. The best way of compare models is through performance metrics. We describe a bootstrap pairwise test below to compare m1 vs m2
Bootstrap Pairwise Test:
Terminology :
- labels and predictions denote true label and predictions array.
- eval_dataset is the test corpus on which m1 and m2 can be compared.
- primary metric -example : prAUC (It can be any metric)
Conclusion :
- Report model complexity while reporting performance metric.
- Use bootstrap pairwise test to compare model performance.
- Use occam’s razor as a guiding heuristic while selecting models.
Further reading :
[1] https://en.wikipedia.org/wiki/Occam_learning
[2] Learning from Data [http://amlbook.com/]