Things My First Ever Kaggle Competition Has Taught Me
Is there something that you want to try out but you keep telling yourself you’ll get to later? “I’ll be ready after I do A and B”, “Only after I finish C” or maybe even “I’ll start next Monday”. For me, that thing (or one of those things) was a Kaggle machine learning competition.
This spring semester, I started the famous “Machine Learning” by Andrew Ng course offered by Coursera. After three weeks of grueling through the course with a pen, paper, and the replay button, Professor Ng informed the students with an enthusiastic smile that it was time for us to test out our skills with our newfound knowledge. I’m sure anyone who has learned anything can relate to the uneasiness to start applying something completely new, and I didn’t feel ready at all.
Competitions felt like a whole another hurdle as it meant I went up against a whole another world of data scientists who knew much more and were much more skilled than I was by far. It was terrifying.
Until then, I had only tried out the new concepts in the course in Octave for the homework and my skills were shabby, to say the least. I had ever used python for this or any of the popular data science libraries like Pandas, NumPy or even attempted data visualization with the Matplot library.
I put off this task until the very last moment until my friend invited me to join his team for a Kaggle competition, and I decided to join because the universe would not forgive me for passing off a chance to start like this.
Enough backstory, time for me to exude upon the world my fountain of knowledge from my grandiose “Hello World”. This is by no means a guide to Kaggle competitions but more of my takeaway from the experience. A lot of the things that I learned might be obvious, but there might be someone that might find this useful.
Competitions Put Learning on the Driver’s Seat
Being attention-starved as I am, the peer pressure was enough for me to get over my initial uneasiness and start learning more about Machine Learning much faster than before. In the first week, I learned enough about using Pandas, Matplot to use in our group kernels. Our competition had tutorial kernels as it was a beginner’s challenge, which also played a huge role in speeding up our learning.
Within two weeks, I had finished four chapters of “Hands-On Machine Learning with Scikit-Learn and TensorFlow” by Aurélien Géron, which probably would have taken lazy me a month. I discovered new models like Random Forest starting from learning the decision trees they were based on to implementing and fine-tuning the model to increase the base performance of our submission by 0.87%.
Communicating Clearly With Others is Important
Make a change to a group kernel? Tell them what you changed and why. Make an improvement? Tell them what improved and how. This may seem like common sense however in making my first change, I was so excited that I submitted without telling my teammates. It was as awkward as you’d imagine. Either way, I found that conversation leads to better workflow as you know what each team member is working on so that you both don’t accidentally work on the same thing or waste time on totally asynchronous things. It just creates a mixture of lost time and energy, not to mention frustration.
I would also suggest, if you could, to meet your teammates in real life. I think it helped our team better communicate tactics and evaluate what each of us could do individually. It helped us figure out what’s best for our team based on everyone’s skill-sets and preferences.
If you are alone, talk to yourself. Times are hard.
Knowing What You’re Looking for Speeds Up the Process
The competition expected us to implement the relatively beginner’s tree-based Random Forest Model, which I had no idea about. I also have never formatted or prepared data before to fit it to the model. However, time was too short to grind hours through books in a soon-ending competition. Fortunately, due to other hard-working and talented data scientists on medium, finding specific articles like “Hyperparameter Tuning the Random Forest in Python” were valuable. Articles like these were very straight to the point and had exactly the information I needed to use my model effectively fast. I was able to understand and use the model in a much shorter period than I would have probably accomplished in a very long time using a book.
Not to say that books aren’t helpful. I think they are essential to learn things in-depth and learn what’s under the hood.
Don’t Give Up While You Still Can Try
I made the last submission to our kernel 13 hours before the deadline, and I continued to try to improve until the last hour despite having classes and a final fast-approaching. The last submission was the most significant change in the competition for our team. It didn’t make us among the winners, but I’m glad I tried until the end. It tested my limits and in the process, really helped boost my confidence.
The Takeaway or the TL;DR
This was one of my most fun experiences of this year. Not only did I learn a lot, but I was also able to test myself and see some results. Coming out of my comfort zone ended but being both a gratifying and unexpectedly fun. Even though our team didn’t win, seeing applied machine learning models in action motivated me to learn more and put in more effort.
I think everyone learning machine learning should attempt a Kaggle competition. It helps get over the initial feelings of inadequacy and actually helps you become more adequate in the craft in the process. It provides a chance for you to get a feel of what a machine learning project might entail.
If you don’t want to immediately jump into competitions, challenges also help hone the skills. I’m currently working on the “Titanic: Machine Learning from Disaster”, which is for data science and machine learning newbies.
Please let me know if you have any questions and comments. I would love advice.