How I broke into data science by doing an online course

Rounak Banik
10 min readJul 2, 2019

--

Ever since I took up Computer Science in the 11th grade, I was fascinated by it. Programming was something I thoroughly enjoyed doing. I spent an inordinate amount of time building toy software (such as a timetable generator) and solving competitive programming problems. This was also the time when movies like The Social Network were inspiring a generation of kids to take up computer science, start companies and change the world.

However, a series of unfortunate events ensured that I took up Electronics and Communication Engineering in college. To put it mildly, I disliked everything about my branch. It didn’t help that my competence of the subject was as low as my passion for it. I struggled. And in a desperate bid to find some solace, I started learning web development and software engineering online. I completed at least a dozen courses on platforms such as edX and Coursera. I even became popular on my campus for a week when I launched a social networking site with certain quirky features (such as revealing the person of the opposite gender who was most similar to you). Web development seemed to be my thing.

The Social Networking site I had created in college

In 2017, my college witnessed the start of a new group called DSG (Data Science Group). All of a sudden, ‘data science’ was the rage on campus. People of all branches were talking about ‘machine learning’ and ‘deep learning’ and how the job market for it was immense. I gave the admission test for the group. Unfortunately, I was rejected. I tried learning data science the same way I had done with web development. But I wasn’t as successful. The field seemed simply too vast and I wondered how the industry could expect one person to have all those skills simultaneously.

Nevertheless, I ended up interning remotely with a startup in New York, helping them develop the frontend of their MVP in January 2017. And in the summer that year, I took up two jobs: one as a Backend Development Instructor with Acadview where I taught college students Python and Django. The second was as a Software Development Intern with an EdTech startup called Springboard. Little did I know that the latter was about to change my life.

One of the perks of my Springboard job was access to their flagship Data Science Career Track. However, since I was working two jobs, I simply didn’t have the time to avail this perk. However, casually going through the course piqued my interest. I was also flabbergasted by the mentors that were available to students; these were experts working at very senior positions in some of the most well-known companies on the planet.

In the last few days of my internship, I had the opportunity to interact with Parul Gupta, Springboard’s co-founder. I asked her if I could pursue the Data Science Career Track after my internship, once college started. She very generously agreed and gave me a backdoor to the track for 4 months. Pursuing this track, valued at $7500, under the guidance of an industry expert was like once in a lifetime opportunity for me and I was eagerly looking forward to it. I was not disappointed.

Phase 1: The Mentor, Kaggle, and TED

Image result for kaggle
Kaggle is an invaluable resource for budding Data Scientists.

I had tried to get my hands dirty with Data Science before. But I always felt overwhelmed and lost due to the sheer number of skills, tools, and libraries you were expected to know. Should I learn Python or R? Do I begin with numpy or pandas? How is this different from AI courses?

I was extremely relieved when I looked at the curriculum. One of track’s biggest takeaway is its structure. Springboard streamlines the entire learning process for you and gives you suggested study plans to complete the curriculum in 4, 5 and 6 months. Additionally, the technical material is very well interspersed with career coaching advice and practice to help you get a job by the time you graduate.

I was also assigned my mentor; Baran Toppare, a Data Scientist from Turkey. If there was THE biggest takeaway of this course, it was his mentorship.

Since DSC was primarily based on Python, a language I was familiar with, I didn’t have a very hard time starting with the material. The first parts of the curriculum covered Data Wrangling (using SQL and Pandas) and Data Storytelling. For job preparation, I had to update my LinkedIn Profile and set up a new GitHub repository for my projects.

Working on Yammer’s Search Functionality was one of my first technical projects and I immensely enjoyed working on it. However, the highlight of this phase was something far more significant and made me seriously consider data science as a career.

My TED dataset has been downloaded over 15,000 times.

For my data visualization project, I narrated the story of TED Talks using a 2012 dataset containing talk transcripts and metadata. Baran was extremely impressed with my project and he encouraged me to publish the dataset and my notebook on Kaggle.

Within a day of publication, I received a ‘bronze’ for my notebook and my dataset was featured and tweeted by Kaggle. As a beginner who was barely a month in, this was extremely exciting for me. However, the cherry on the cake was when the CEO of Kaggle commented this on my notebook.

I was over the moon. So, in the following week, I scraped the TED.com website to extract metadata and transcripts for every talk that had ever been published on the platform. With my notebook and dataset updated, Anthony sent my work to the TED Team. I also received a reply from Bruno Giussani, Managing Director of TED Europe, who said he loved my work and he would definitely share it with his colleagues.

Phase 2: Statistics, SciPy, AMEX, and Airbnb

The module that I had the most difficulty with was Inferential Statistics. And to be honest, if it wasn’t for my mentor, I would have given up at this point. However, Baran really shone with this one.

His advice and suggestions on my Inferential Statistics projects were some of the very best. He also linked to me to a treasure trove of articles, books and videos about the subject. Going through this suggested material got me so comfortable with the subject that I ended up writing a series of tutorial notebooks on the subject (with Baran’s guidance) and ended up giving a talk on Inferential Statistics at the SciPy India Conference at IIT Bombay!

At this point of time, American Express was also organizing a national level Data Science hackathon where participants had a week to solve a classification problem. When the competition started, I had absolutely no knowledge of machine learning and I didn’t intend on participating.

However, when I mentioned this to Baran in our next call, he strongly encouraged me to take part. He advised me to spend more time on wrangling and feature engineering. For the machine learning part, he asked me to learn a little bit of scikit-learn and read up on Random Forests and XGBoost, stating that the latter was usually used in most winning solutions.

I did exactly that. And although neither of us had expected it, I ended up being ranked 12 in the National Public Leaderboard and received a certificate from AMEX for Outstanding Performance. It was at this point of time that I really started believing that I could be good at this and I had only Baran and Springboard to thank for it.

Phase 3: Airbnb and Informational Interviews

Apart from a myriad of technical assignments, Springboard also expects you to complete two Capstone Projects as part of the curriculum. These Capstone Projects have to be proposed by the student and is unique to every student. This way, the student gets to know about the amazing variety of problems that can be solved with data and also build something from scratch; right from acquiring data to giving business recommendations.

My first capstone project was on Airbnb New User Bookings, which was based on an archived competition conducted by Airbnb on Kaggle. With my new found confidence from the AMEX Competition, I was able to complete the Machine Learning module with relative ease and complete my Capstone Project within a week.

Working with the Airbnb project got me really interested in Airbnb itself and curious about the kind of challenges it faces. Coincidentally, this was also the time I was expected to conduct informational interviews with professional data scientists as part of my job prep. As luck would have it, Baran personally knew a Data Scientist from Airbnb and introduced me to him.

I had an hour-long conversation with Chirag Mahapatra and he gave me an invaluable sneak peek into the kind of problems that the Airbnb data team was solving. He told me about the kind of people Airbnb looks for and it really made me wish for an opportunity to work in such an environment in the future. We also ended conversing a little about Bitcoin and blockchain.

My second interview was with Bahar Erar, a Research Scientist at Amazon (also introduced by Baran). Bahar came from a Ph.D. background and gave me a very different perspective of coming into Data Science (from an academic background). Like Chirag, Bahar gave me an insight into the kind of work Amazon does and also gave me advice on higher education and the prospect of getting jobs in the United States.

Phase 3: Movies, Datacamp and the Kaggle Kernel Award

The next phase of the program had me working on my second capstone project, advanced machine learning (time series, recommender systems, etc.) and Big Data Technologies.

Inspired by the success of my TED Project, I decided to do a similar project with movies. For this, I acquired movie metadata from TMDB for over 45,000 movies released between 1874 and 2017. I also uploaded the dataset on Kaggle and it trended on the top for three straight days. It is also the largest movie dataset currently available on Kaggle.

Since my capstone project was required to have a machine learning component, Baran suggested I also work on Movie Recommender Systems. I did exactly that and build three types of recommenders: Simple (an IMDB Top 250 clone), Content-Based and Collaborative Filters. A portion of my project was also accepted by DataCamp as a tutorial.

My journey was approaching the end. As luck would have it, I was also independently working on a separate kernel on the state of Data Science and ML on Kaggle. The kernel ended up winning the very first $1000 Kaggle Kernels Award.

Final Phase: Project Walkthroughs, BTP and UpGrad

My experience with the program had been so good that, for my B.Tech Final Project, I proposed a Data Science Problem: Fake News Detection. I begin extensively studying Natural Language Processing at this time.

In my final phase of the program, I was expected to go through a few mock interviews. My Project Walkthrough interview was with Dipanjan Sarkar. Coincidentally, I had been reading his book to work on my Fake News Detection problem. It was truly a surreal experience to have the author of a book you were reading, reviewing your projects. And I think this encapsulates the true essence of Springboard’s program: an extremely powerful network of people to learn from and connect with.

A few days before graduating, I received an offer from UpGrad to write articles on the topics of Data Science, Software Engineering and Web Development. I ended up authoring an article on how to start with Data Science which was heavily influenced by my experience with this program.

Life after Springboard

Data Science continues to be an extremely important part of my professional life. After Springboard, I ended up signing a contract with Packt and my first book, Hands-on Recommendation Systems with Python, was released in July 2018.

I also collaborated with DataCamp towards launching a course titled Feature Engineering for NLP in Python. The course is live on Datacamp and the first chapter is free!

The year following DSC, I was pursuing Liberal Arts at Ashoka University. My job hunting process, however, was made extremely simple with the skills I had developed as part of Springboard. I ended up landing a couple of job offers, both in on and off-campus placements.

To sum up, pursuing Springboard’s Data Science Career Track was one of the most fulfilling and challenging experiences of my life. Although I had taken up this program purely out of intellectual curiosity, by the time I was done, I was certain that I would love to become a Data Scientist some day. I am going to fulfill this dream when I start my job this August. :)

I hope this article has given you the requisite amount of inspiration to take control of your learning and pursue fields which you may have thought were inaccessible. As you may have figured already, I can fully vouch for the potency of Springboard’s Career Tracks in helping you break into your field of interest. I currently am associated with Springboard as a Community Manager and TA and I help students personally towards achieving their goals and completing the curriculum. If you have any questions regarding the track, feel free to drop me a mail at rounakbanik@gmail.com. Thank you for reading!

--

--