Interview with Twice Kaggle GrandMaster and Data Scientist at H2O.ai: Sudalai Rajkumar
Index and about the series“Interviews with ML Heroes”
This is a special first in the interview series. Today I get to interview a great kaggler from my homeland (India).
I’m honored to be talking to Kernels (Ranked #1, kaggle: @sudalairajkumar) and Competitions GrandMaster (Ranked #140), Discussions Expert: (Ranked #53): Sudalai Rajkumar
Sudalai Rajkumar has completed his executive course in Business Analytics and Intelligence from Indian Institute of Management-Bangalore, he has a background with a BE from PSG College of Technology
He is currently working as a Data Scientist at H2O.ai, before H2O.ai he had worked at various other companies in key positions: as a Lead Data Scientist at Fresh works, Tiger Analytics and lead of R&D at Global Analytics.
About the Series:
I have very recently started making some progress with my Self-Taught Machine Learning Journey. But to be honest, it wouldn’t be possible at all without the amazing community online and the great people that have helped me.
In this Series of Blog Posts, I talk with People that have really inspired me and whom I look up to as my role-models.
The motivation behind doing this is, you might see some patterns and hopefully you’d be able to learn from the amazing people that I have had the chance of learning from.
Sanyam Bhutani: Hello Grandmaster, Thank you for taking the time to do this.
Sudalai Rajkumar: Hello Sanyam, the pleasure is mine too.
Sanyam Bhutani: Currently, You are crowned the king of Kaggle kernels with Rank #1, you’re a Comp GrandMaster as well as a Discussions Expert.
Can you tell us how did you get interested in Machine Learning and in kaggle?
Sudalai Rajkumar: I have an interest in finding patterns right from my childhood which eventually lead me to take up a job in analytics field over the core engineering field. So I started taking up MOOC courses to gain knowledge in machine learning. I was able to get a theoretical understanding from all these courses but I was not sure how to use all of them. So I was looking for an opportunity to try them out. That is when I got introduced to Kaggle to get some hands-on experience.
Sanyam Bhutani: You’re currently working as a Data Scientist at H2O.ai and have been working in the Data Science space during the past few years.
Where does kaggle come in the picture? Is it related to your other projects?
Sudalai Rajkumar: Yes, it started as a way to learn new concepts in the field. I started to work on Kaggle problems after my office hours.
Sanyam Bhutani: H2O.ai is working on many exciting projects, could you tell us more about your role at H2O.ai?
Sudalai Rajkumar: Yes, H2O is working on multiple exciting projects and there are several wonderful people in the company. Currently, I am working on the Natural Language Processing side of Driverless AI. Driverless AI is an automated machine learning platform and you can read more about it here.
Sanyam Bhutani: You’ve had many amazing finishes on competitions.
Could you tell what was your favorite challenge?
Sudalai Rajkumar: It was the Rainfall Prediction Competition in Kaggle. I got an awesome chance to team up with Marios and we finished second on that one. It was my first gold medal in Kaggle and I learned a lot of new concepts working with him. It also gave me a lot of confidence that I can do well in the competitions.
Sanyam Bhutani: You’ve had great results-both in solo finishes and team finishes.
For a noob kaggler-What tips do you have when forming a team or not?
Sudalai Rajkumar: Teaming up in competitions is definitely a great way to exchange ideas and learn new concepts. My tip would be to not team up with someone who is far ahead in the competition leaderboard or someone who is far below. In the former case, that person would have already done most of the things and we won’t get to learn too much and in the latter case, it might be hard for the other person to catch up with us. So it is better to team up with someone in the same rank range in the leaderboard to have a better learning experience. Also, it is good to team up with a person who has ideas/models different from what we have.
Sanyam Bhutani: What kind of challenges do you look for today? How do you decide to enter a new competition?
Sudalai Rajkumar: Honestly, I am not doing many challenges these days. I am trying to do some image competitions off late to learn more about them. I do not have much experience in this field.
Sanyam Bhutani: What are your first steps and go to techniques when starting out on a new competition?
Sudalai Rajkumar: The first step would be to do an exploratory data analysis and understand the data. Then I will try to create a good validation methodology. Then the next step would be to create a baseline model using given features (Light GBM mostly for structured data and Deep learning ones for unstructured data), Make a submission and make sure that the pipeline and the cross-validation are working fine.
Sanyam Bhutani: Currently, you’re the King of Kernels, being Ranked #1.
Can you give us an insight into what efforts go into your kernels? What’s your workflow like when writing kernels?
Sudalai Rajkumar: Most of the kernels I wrote are exploratory in nature. So now I have a code base for different types of plots which helps me write those kernels faster. Once the dataset is released, I generally try to look at the data and see if there are any interesting patterns in the data. So most of my efforts go into finding interesting signals in the data and looking for the best plots to represent the same. I also constantly look at other people’s kernels to learn new ideas to represent the data, new tools to plot the data and so on.
Sanyam Bhutani: What suggestions do you have for beginners who want to write great kernels?
Sudalai Rajkumar: Kindly read multiple good kernels and try to understand them in detail. Learn how they create insights from the data, the plots they have used to portray the data, the inferences that they have come up with. It is also a good idea to take up a new concept (like a new algo or a novel technique) and educate people about the same. I personally do not like the kernels which just blends the output of two or three other kernels and get a high score.
Sanyam Bhutani: For the readers and noobs like me who want to become better kagglers, what would be your best advice?
Sudalai Rajkumar: Some very valuable points are
- Create a generic code base which will be helpful in the long term.
- Learn to look at the data and to do feature engineering.
- Look at the forums/discussion channel for more ideas and better understanding.
- Kaggle kernels are immensely helpful and so make use of the same whenever possible.
- Iterate ideas quickly — fail fast and learn fast.
- Only 1 out of 10 ideas work in general and so do not give up.
- Use a reasonable system — use cloud if necessary.
- Choose the right competition.
- Put in your heart out.
Sanyam Bhutani: The general opinion is that Machine Learning opportunities in India are currently very sparse for a fresher’s position.
What advice would you give to the junior data scientists who want to take up a job in the field?
Sudalai Rajkumar: Apart from theoretical knowledge, companies also started looking at other related activities like GitHub projects, hackathon performances, open source contributions, blogs, meetups, internships and so on. So it is better to build a machine learning portfolio to showcase our potential and grab good opportunities. Also, this is a fast-changing field and so it is necessary to keep us updated with the latest happenings.
Sanyam Bhutani: Given the explosive growth rate of ML, How do you stay updated with the recent developments?
Sudalai Rajkumar: Most of the ML research community people are active on twitter and share any prominent developments in this field. I mostly follow them to know about the latest happenings in this field and keep me updated. There are quite a few good people in LinkedIn as well who share such things.
Sanyam Bhutani: What developments in the field do you find to be the most exciting?
Sudalai Rajkumar: Since am working on the NLP side of things currently, the development on the transfer learning models for natural language tasks are very exciting for me this last one year. So hopefully we will be able to accomplish more applications on the language side in the upcoming days.
Sanyam Bhutani: What are your thoughts about Machine Learning as a field, do think its Overhyped?
Sudalai Rajkumar: There might be a bit of overhype about ML as a field due to its sudden surge and so on. But I think it is going to stay here for a long time and change the way things are getting done.
Sanyam Bhutani: Before we conclude, any tips for the beginners who aspire to be like you someday but feel completely overwhelmed to even start competing?
Sudalai Rajkumar: It is always good to get into the water to learn swimming ;) So do not worry about anything else and start getting your hands dirty with data. It is the best way to learn things. All the very best!
Sanyam Bhutani: Thank you so much for doing this interview.