Having been former kaggle number #1, while being in the industry for many years in senior data science and management positions, finishing my PhD in machine learning, having developed software applications for predictive analytics - I disagree with the intent from this article.
It has a few good points about how common are “kaggle” problems in the real world and it displays some good data science resources , but I would say kaggle problems are in principle more difficult than real world problems (as they are subject to competition) and in that sense , being able to do them (competitively) is NOT a waste of time , but rather doing the extra mile and be up to date in regards to perfecting your data science skills and enlarging your ml arsenal.
You may assume a good kaggler would thrive in real world data science applications (unless softer skills are needed which cannot be assessed by the competition,kernels or discussions only) and it is definitely counted for in CVs (I can confirm both as data scientist and manager in the field) much more than Andrew’s NG course (which by the way is really good) and other programming courses because:
- It shows the mentality to get your hands dirty and actually put together some workable code exploiting the principles of feature engineering, model selection , hyper parameter optimization, avoiding over-fitting while tackling under-fitting , minimization (or maximization) of cost functions and more.
- The ability to do that competitively . Any way you put a kaggle competition is not an easy task — it is a analytical problem to solve , being able to do well against 1,000 of competitors reflects well in CV’s especially for beginners that try to get access to the job market and did not have the chance yet.
I had already developed my own software of analytics (http://www.kazanovaforanalytics.com/software.html) prior to joining kaggle, yet I found my skills improved exponentially while competing there much more than reading books, academic papers or my PhD itself as some of the best data scientist choose this platform to compete and share their approaches with the community. Kaggle has a great data science community and lots of innovations take place in its platform — I would say this is its strongest part.
More specifically after 100 competitions I have learnt:
- To differentiation dogs from cats’s images
- to identify a bird from its sound
- Passing an 8th grade science examination test, building a model based on the while English wikipedia.
- To identify between series of pairs, which one is causing the other (e.g. is it high temperature that causes increase in ice cream sales? or vice versa)
- To predict the NCAA!
- To predict Default
- To predict marketing Response
- To predict Sentiment from text
- to predict what a score a user will give to a store
- to predict the Higgs boson!
- and so on…
To sum up , kaggle has been huge step forward in my data science journey (both knowledge and career-wise) and if kaggle is not the home of data science , then nothing is . I recommend you to at least give it try .
Forgot to say its lots of fun too!