How a beginner should approach Kaggle?

Mehul Gupta
Data Science in your pocket
3 min readJun 15, 2019

--

Being a part of Kaggle community for a year now, I am trying to bring out what has kaggle on offer for all young data scientists out like me. Most beginners believe a couple of MOOCs, some interview questions and taddaaaaaa!!!! We will land in Data Science. But as said, practical implementations & theoretical knowledge are ways apart. So, I will walk through the Do’s and Don'ts for a newbie kaggler.

One must remember, Kaggle is just not a competition hosting platform, but a lot more.

  • To start off, pick up a problem, either Titanic(classification) or House price(regression) as a lot of kernels(scripts/ipython notebooks in both R/python) would be available for them. Instead of jumping straight away to solve it, try to explore some solutions that seem relevant to you by its name having a decent score like — Titanic using Knn, titanic for Beginners — XGBoost, LGBM, etc.(can confuse you in the beginning). Understand how to approach a Data Science problem, what different models can be used for the same.

Don’t try to rank higher by copying others solutions or using a black box trick you don’t know as it is never going to help. Even in interviews, the approach used for the problem is asked rather than the rank. Try to understand every step you take like reason yourself why to fill NaN values? why distribution has to be Normal? why skewness should be avoided? reasoning holds the key.

  • If any query comes up, Kaggle Discussions can be a great place to get some wonderful solutions you might not expect googling things up!! you might get a medal as well for posting a question or answering one( I have 23!)
  • Kaggle learns can be taken as a summarized version of a Data Science book, but with implementations. Do give a try. It covers python, analysis, ML, basic neural networks and many more topics.
  • Apart from competitions, you can take up any Kaggle dataset ( Kaggle has a huge pool of datasets or you can also upload), do anything(but worthy) and show you analytical powers. Kaggle does provide cash prizes($2000) for best kernels as well! Cherry on the top.
  • Explore different genres like text data, time series, regression, classification, multi-classification to know about the difficulties with each type of data and don’t stick to only similar sorts of problems.
  • Spend about 15–20 days with a single problem statement trying out all possible valid models you can think of and compare your prediction for these models. Sometimes following rules don’t give you the best predictions.
  • Kaggle kernels are very powerful, free GPU with most of the required libraries already installed, it can be the best place to be for Data Science.
  • Follow kagglers you feel with most relevant solutions, best discussion answers, etc., you will get notified whenever they are with some sort of activity.
  • Form up teams with fellow kagglers and take up a challenge as more minds mean more ideas.
  • Do mention your kaggle activities in your resume. Its always a plus point.

Kaggle can play a handy role in your exploration of Data Science. The resources provided are just unmatched. Though, most of the aspirants are trying to do things quickly and end up getting no opportunities. Take time, don’t rush to learn everything, but try up learning efficiently.

If Data Science is a marathon then Kaggle is the coach!!

Check out my other articles as well!!

Important Analytical steps for Data Science projects

Best free Data Science Resources online (pdf links available)

Time Series for beginners

Maze with Q Learning (codes available)

--

--