CODEX
The Strategy That Increases Model Accuracy, Every Time, Guaranteed
I’ve been pretty active on Kaggle (a data science learning platform), especially in viewing the kernels (code notebooks) people create. Whether it’s for a competition or a simple analysis, I notice one thing that the vast majority of people do — they only use the data they’re given.
Granted, some Kaggle competitions don’t allow external data — but most of those are Playground or Research competitions. The competitions that offer high cash rewards are from companies who don’t care what the process is, as long as the model yields the results.
In this article, I’ll demonstrate this simple strategy on a Kaggle competitions dataset to show a 3rd-place, first try solution— and give some guidance on how best to do it along the way.
The Pitch
People care a lot about the best algorithm. Of course, finding a suitable algorithm is important. However, this can really be easily done by running several models each through some sort of method to evaluate performance (perhaps k-fold cross validation) and selecting the one with the best score, or using an ensemble. At a certain point, you’ve found an algorithm that works well with the data. Have you…