Originally published: 07.16.2019
Last week we crowned the world’s first-ever Triple Grandmaster, Abhishek Thakur. In a video interview with Kaggle Data scientist Walter Reade, Abhishek answered our burning questions about who he is, what inspires him to compete, and what advice he would give to others. If you missed the video interview, take a listen.
This week, he’s answering your questions!
See below for Abhishek’s off the cuff responses to select Twitter questions. Have something more you want to know? Leave a comment on this post, or tweet him @abhi1thakur
Here’s what YOU wanted to know…
I used to read and implement quite a lot of papers during my master’s degree and then during my unfinished PhD. After that, I decided to join the industry and thus I read papers relevant to the industry I am working in. Sometimes I also read papers I come across on Reddit and Twitter and also Kaggle. Recently, I have read papers on XLNet and BERT.
As for my favorite tools, Python is my bread and butter
I love scikit-learn, XGBoost, Keras, TensorFlow and PyTorch.
It’s very difficult to find the time when you are working. Here’s what I do: I wake up early and work 1–2 hours on a Kaggle problem before work each day. I try my best to start a model and have written scripts that will do K-Fold training automatically. I also have some scripts that automate submissions. When I’m back home from work, these models finish and I can work on post-processing or new models.
A few hours every day if you are a student. If you are working, maybe an hour or two a day. You can invest a few more hours over the weekends. Rather than investing time, it’s more about understanding the problem statement. I suggest writing down a few different approaches to try.
It’s also very good idea to read the discussion forums as a lot of ideas are shared there. If you’re just starting with Kaggle, you also might want to take a look at past competitions and learn how the winners approached the problem. From there you can try to implement them on your own without looking at the code.
Every competition brings its own challenges and there is something new to learn from each one. For example, an image segmentation competition can be started by approaches like U-Net or Mask R-CNN. In a given image segmentation problem, one approach might outperform the other. So, you have to know which approach will work best in different scenarios and that can only be done when you have worked on several image segmentation problems.
Same with tabular data competitions. You can get numerical variables or a mix of numerical and categorical variables. If you have experience with these, you will know right away which approach works well and which models you can start without a lot of processing on the dataset.
So, yes, the process becomes smoother with every competition you try. The more competitions you participate in, the more you learn. Once you have a lot of scripts and functions that you can re-use, you can just automate everything (well, most of the things).
One of the most difficult challenges I worked on was the Stumbleupon Evergreen Classification Challenge. Now, if you look at that competition, you might not even find it challenging. At that time though, I had no clue about NLP and the tools and libraries we have available today to process text data and clean HTML.
Another tough one for me was the Amazon Employee Access-Challenge. Here, we were given categorical data which was again very new to me. Any time there’s something in the data that you have less knowledge about or don’t know about at all, it can be challenging. The only way to avoid this is to learn the different approaches, and practice, practice, practice.
Check out Andrew Ng’s courses on Coursera. He explains everything in the simplest manner possible. I think you would need some basic mathematics background which you might have already and if not, I suggest working a little bit with algebra, some basic calculus, and probabilities. The only way to learn is to solve some problems. When you have an idea about how the problems are being solved, dig more into the algorithms and see what happens in the background.
One of the best things I’ve learned is to never give up. When starting in any field, you will fail several times before you succeed. And if you give up after failing you might not succeed at all. Another important thing I’ve learned is how to work on a team — how to manage time and divide tasks when working on the same problem. I also learned a lot about preprocessing and post-processing of data, different types of machine learning models, cross-validation techniques and how to improve on a given metric without compromising on the training or inference time.