From a novice to one of the youngest Kaggle Competition Master and landing in a Fortune 500!

Published in

Analytics Vidhya

11 min readAug 1, 2019

There are various articles out there talking about skills required to enter into the world of data science, or sharing the interview experiences and opinions on a data science career. But I rarely find articles where people emphasize on the time when they started the journey.

The overwhelming feeling of stepping into data science and the initial experiences are under-discussed; perhaps, because the field is relatively new for everyone. After all, the terms ‘Machine Learning’ and ‘Data Science’ were introduced just a few years ago. Previously, it was all known by the term ‘computational statistics’.

In this post, I would like to share all about my journey in Data Science. Let’s just begin 😺

When I first started, I wasn’t very good at it. And, to be honest, mathematics was never my favourite subject. Yet, today I admire the research going on this field, which is helping us unraveling mysteries.

The article is a 10-minute. If you are using VPN or any private networks you might not see animated GIF content. Enjoy the read 😃

About Me 🎈

My name is Shahebaz. I am a recent graduate from JNTUH University, Jagtial in Electronics and Communication Engineering. I have recently won 4 silvers and 1 Gold medal in competitions and with about 31 discussions gold medals and to my surprise am now the youngest 21 year old Kaggle x2 Master in my country.

Before data sciences, I spent most of my time in ethical hacking, reverse engineering packages, and software development. I was also a technical writer, Android ROM and a MEAN stack developer.

Today, I am blessed to work among innovators, AI researchers, data scientists, and one of the most creative data science team at Societe Generale Global Solution Center, Bangalore, India.

Societe Generale Global Solution Center, Bangalore

The Road not taken 🚵

I started participating in competitions only recently and in the last 9 months, I have won a bunch of medals in various competitions. As odd as it might sound, I was too afraid to participate in Kaggle competitions because — They were HARD!

So, where did it all start?

While contributing to Oppia.org back in 2017 where I was a member and active contributor I started learning Python 2.7

I thought I have learned so-called “outdated” version Python 2.7 and had to re-learn Python 3 all over 😅

I can barely make a list or add/join strings. It must be clear — what kind of novice programmer I was then.

It was during my open source contributions that I learned the art of programming, software versioning using git, and the importance of well structured and documented code.

One day, I was searching what else Python 2.7 could be used for, and browsing through the internet I landed at Kaggle. The famous Kaggle statement was staring right at me and after reading the problem statement I was counter staring the screen in total surprise!

“In this challenge, we ask you to complete the analysis of what sorts of people were likely to survive”

When I read this. I was starstruck. Predicting people who survived? What kind of sorcery is this? I immediately checked the authenticity of the Kaggle website 😜 and the second wonder struck moment was when I realized —

Data Science sounds like a sci-fiction tech that is very real in the 21st century!

It was like a sci-fi movie came in real life to me and its concepts were completely alien to me. That was it. How could I look back?

I decided that I want myself in. And the Data-science-learning fever was all over me. (Honestly, it hasn’t worn out even today 🕺)

How did I learn? Recommended Resources? 📚

The paradox of choice — More is less

There are tons of courses out there today. So many that it is easy to land in the paradox of choice with a single Google search — “How to Become a Data scientist”. The feeling is similar to choosing a wonderful starter among a plethora of options in a restaurant. For the scope of this article, I will list the few of my top reads.

Python Recommendation

If you code in Python and your skills are intermediary then this book deserves to be on your shelf. Period.

Data Structures and Algorithms in Python by Michael T. Goodrich, Roberto Tamassia, Michael H. Goldwasser

The book dives deep into the concepts of OOPs, data structures, and algorithms in Python. I love the exercises it has at every end of the chapter. I still carry it for day to day references when making my garage projects that require optimizations and formatted coding.

Hands-On Machine Learning Basics

After wrapping up the Python basics. I brought several books on Machine learning and out of which I recommend;

Introduction to Machine Learning with Python

Introduction to Machine Learning with Python by Andreas C. Müller and Sarah Guido

It deals with basic bread and butter of Machine Learning and was written by the founder of scikit-learn himself and it’s a go-to book for a complete beginner.

Reading this book got me so familiar with scikit-learn that I ended up pushing a few PRs in the official scikit-learn repository. Never say never.

However, if you are familiar with data science and have made into Top 20% of any machine learning competition you will find the above book complete boring. (You have been warned ⚠️ )But, there are sure some handy tricks and methods that will help you get an edge in competitions if you read it thoroughly.

Another great resource that I added more value than any MOOC is Introduction to Statistical Learning with Applications in R

If I were an interviewer, Iwould look for someone who had understood the ISLR in theory and practice.

The book is in R language. But, I still recommend it any day.

Quench For More Gyan? 🌎

Often in a research context and for someone who wants to dig deeper and reach the core of statistics, Elements of Statistical Learning is a recommendation to them. Both of ISLR and ESL are freely available. So, you don’t have to break your bank, instead you get enormous knowledge of machine learning just free

People often ask me — Which MOOCs did you take to learn Machine Learning?

The answer is None. I don’t find the online courses to be effective. At least, the ones in which I have enrolled. Few of reasons are;

It’s hard to find a conceptual reference in a video compared to a book
Books dive into enough details while courses follow an agenda

This doesn’t mean that I hate MOOCs. In fact, I am planning to get some done this year. I recommend getting your fundamentals strong enough rather than focusing on the luxury of watching videos and settle only for the content of a single course. Reading books and resources surely expanded my knowledge of certain concepts that I now can explain from different perspectives.

Life of Novice Aspirant 🦄

I often get messages on LinkedIn from people who wants to step into data sciences. But the bitter reality is little on steep learning curves depending on what you have previously done. I will let you walk through from my experiences

Here are the questions I had then;

How to get started in data sciences?
I am done with a bunch of algorithms. What to do next?
How to land into a job and crack data science interview?
Is doing Kaggle worthy? Will competitions make a difference?

I was so desperate to get answers to these questions too!

Now, that I have been through decent learning I can tell you in one line answer. It's not all that simple. It's complicated

Part of it is because the terms — Data Analyst, Data Scientist, and ML Engineer though are three different titles. The nature of the job differs from company to company.

Data Analysts can sometimes do only SQL queries, or build models or do Business Analysis. Sometimes they also step into the shoes of Program Manager.
Data Scientist sometimes make basic models, other roles require PhD. research work and few other requires model tuning and deployment on large scale systems
ML Engineer is a Software engineering expertise fused with data science knowledge

The thing is …

There is no fixed job description of your dream data science job.

Many companies out there are still figuring out “What exactly falls under data sciences and what kind of problems to focus on”. In such a scenario, it becomes very important that your learning is continuous and not a MOOC course timeline limited.

Follow your passion and solve problems. Gather data from your Android phone, stream tweets and study followers of your favourite actor, apply weird machine learning cases on Avengers Infinity war.

Imagination is limitless and as are the possibilities with ML. Let us now explore what are the competence skills for being a better data scientist

A bucket list of data science aspirant ☑️

The list is generalized for data scientist role however you might be needing more skills if you apply for the domain-specific role.

1. Master Statistics and Probability 😎: Easily, the bread and butter of the data science realm. I highly recommend a bread-first approach on this unless you are penning a research paper or in academia roles.

In the real-world, although you will be not using stats knowledge quite often as writing in code. But, having a strong grasp of stats will make you not only a better data scientist but will help you make key decisions

2. Participate In Data Science Competition 🎯: Participate to learn and not win. I understand winning and price money sounds lucrative. Start easy. It took me 100+ competitions losing to get into my first top 50’s leader board. No one starts a winner from day one. The competition will help you retrospect yourself in the hugely competitive world out there.

3. Your Projects Speak For You 🌟: Make your GitHub profile alive and complete at least 2–3 projects with end-end implementations including documentation. Although, competitions are a way to apply your skills there is a huge backlash against them.

Competitions are not synonymous to real-world data science tasks. They are just like serving a ready-made platter in a 5-star⭐️ restaurant

Contributing or working on a project from scratch will let you get experienced in real-world data science tasks that are considered to be most important.

Data Science & Many Facets 😨

If you are looking for a role the job description demands you to be anything from “rockstar” data scientist to have 8–10 years experience in “spark”.

Just run! (Spark was introduced in 2011!)

There are a lot of companies that are trying to push data science just because of the hype.

Sometimes all you need is .groupby() and not Machine Learning

Going forward the roles will get widely diverged. Figure out your choice of the domain where you want to land as a data scientist. Target your projects and competitions in that domain.

It is preferred by recruiters to hire someone who has already worked on something that the company is currently working on

Assume that you are going to a restaurant for having Hyderabad Biryani. Will you prefer a new cook who had just looked at the recipe on YouTube or into a restaurant that is specialized in making authentic Hyderabad Biryani? The same goes for companies. They would want someone experienced who had already cooked some biryani. Correction, authentic Hyderabadi Biryani!

For example : I often stick to the financial domain and NLP driven competitions. This was the biggest advantage to be when I was getting interviewed few months ago. I could ask more domain related questions and know what are projects the company is currently working on and where I would see myself fit.

After all, what’s more, exciting than working on your favourite project every Monday morning ☀️

We are in Endgame. Don’t snap yourselves 🔚

I came across this wonderful scenario, actually a drastic one. Imagine if Tensorflow is wiped clean and gone?

You realize you are no longer a Data Scientist and ML Engineer!

Funny enough. But legit. There can be several tools and data science libraries coming in the future. And I bet they are. There was a time XGBoost was trade secret algorithm and today it is fully open sourced. Tomorrow some other tools will change the data game.

AutoML is already leading the next generation of data science solutions. I personally find Driverless AI from H2O.ai amazing!

There always comes a point in the industry when a repetitive task is automated. Make yourself informed and learn core skills rather than syntaxes of few libraries. Explore into

GANs
Reinforcement Learning
Machine Learning Interpretability

Keep yourself fed with knowledge and trends. If you don’t .transform() and .fit() yourself to the advancements you will be an under fit classifier going forward.

Credits and Mentions 👼

I want to take up this section thanking a few of the people who have been a lighting guide in my life. Have taught me things, corrected my mistakes and have continued to guide and inspire me to this day.

Sudalai Rajkumar, Shivam Bansal, Kunal Jain, Sunil Ray, Kazanova, Raghu Kalyan, Bharti Kukreja, Amit Yadav, Asif Mohammed, Ratul Gosh, Ladle Patel, Srikanth Verma Chekuri, Abdul Majeed Raja RS, Mohsin Hassan who have taken their significant time guiding me. I wish I could thank them more.

Hi-fi to my ML brothers — Kanav Anand, Sanyam Bhutani, Rishi and Aditya Soni for hanging around and bearing me in many competitions!

That will be all. Do reach me out at my social handles. I will be happy to answer and have your acquaintance.

Let’s get connected over LinkedIn 📘 and Twitter ❤️. Share the post if you have found it useful and show your support with 👏