Member-only story
Four ways teams win on Kaggle
Frameworks, multiprocessing, the cloud, and bigger teams
Kaggle is home to the best and worst of machine learning. It’s the battle grounds for thousands of teams vying to build the best models, solve real-world problems on a deadline, with hundreds of thousands of dollars at stake. Training these world-class models on large datasets often amounts to hundreds of compute hours. In this post, we’ll explore some techniques winning teams use and how to speed things up.
#1 — Use top tier frameworks
Sometimes competitions on Kaggle result in models that are impossible to reproduce or are out-right fraudulent. But some of these competitions set new bars for the entire industry. Kaggle is where machine learning is in the spotlight and huge strides of progress are made. The world’s best model building frameworks were inspired by and built around Kaggle competitions.
To win — teams squeeze every last decimal of performance out of their tools. When the tools aren’t good or fast enough — better tools are built. Examples of this are the popular gradient boosting libraries:
- 2014 — XGBoost — [github] [paper] — during 2015 Kaggle competitions, 17 solutions of 29 winning solutions used XGBoost.
- 2017 — LightGBM (LGBM) — [github]…