Data Science Road Map

Published in

Data Science Group, IITR

4 min readAug 10, 2019

In the 21st century, computer science advancement, development of intelligent machines and generation of immense amounts of data has led to the development of new fields of study, buzzwords, Data Science and Machine Learning. From simple tasks like sales prediction of the industry to ambitious projects like self-driven cars, everything is becoming possible by using algorithms and techniques of Data Science.

Realising the potential of this field a lot of students in and around the universities are motivated and enthusiastic to pursue it. The internet has become a hub of innumerable resources to guide the students but that leads to more confusion than clarity. With video lectures, online courses, language & software packages, books, practice platforms etc the path is not well defined.

Hence we have tried to curate a well structured and resourceful path for anyone who wants to try their hands out in this field. One must keep in mind, it will require dedicated efforts and time, but in the end it’s all worth it.

PROBABILITY AND STATISTICS:

Probability and statistics will help you understand the fundamentals behind Machine Learning Algorithms, hence, having good understanding is important. You can follow Probability and Statistics for Data Science (Series) on Medium. This is a 6-blog series that will help you with the basics of probability and statistics.

LINEAR ALGEBRA:

In order to understand Deep Learning Techniques, one must be comfortable with Calculus and Matrices. Three Blue One Brown Lecture Series will help you develop a good understanding of the topics. You can check out Gilbert Strang’s Linear Algebra MIT Open Courseware if you really want to explore the field.

MACHINE LEARNING:

Machine Learning is perhaps the most important aspect of Data Science. This is the step where most beginners quit. But if you persist, there’s nothing stopping you!

a) BOOKS

ISLR / Python Machine Learning Sebastian Raschka

Estimated Time: 25–30 days

These books will help you with the fundamentals of machine learning, covering all the major topics, side by side you can get started with the implementation of ML algorithms.
ISLR is a theory/math-intensive book & the codes are written in R and thus you may refer to the book’s python conversion here. On the other hand, Raschka is more interactive in the sense that, implementation and theory go hand in hand.
Some common libraries in Python such as sklearn, numpy, pandas, matplotlib and scipy will come in handy during the implementation of ML algos. We strongly suggest you to not worry a lot about them and instead of paying special attention to learning them, you can just start with the code following either of the 2 books and learn about these libraries along the way.

b) MOOCs

CS229 / CS109

Estimated Time: 10–15 days

For people who don’t like reading books, MOOCs are a good alternative.
You can follow any of the two courses — Andrew NG’s CS229 Machine Learning or Harvard’s CS109 Data Science.

KAGGLE / ANALYTICS VIDHYA

Having completed the ML part, you are now adept to start participating in different competitions where you can test your skills with Competitive Data Science. However, don’t get stuck as participating in competitions is more like a sport which ensures learning and growth accompanied by fun and excitement.

DEEP LEARNING

Fundamentally, deep learning is a part of machine learning. But given the popularity and the innumerable resources focused on this, it is apt to treat this as a separate domain.

1. Deep Learning Specialisation Coursera:

Estimated Time: 20–25 days

This is a 5-course specialization which will help you with the fundamentals of deep learning, various techniques used an introduction to Computer Vision and Natural Language Processing.

Don’t forget to apply for financial aid well before time so that you can undertake assignments as well as quizzes.

2. Dive Into Deep Learning PyTorch (d2l-pytorch)

Estimated Time: 20–25 Days

This is a great book for people who want to implement as well learn Deep Learning theory from scratch in PyTorch.

It is free and open source, available as a GitHub repo named d2l-en. You can have a look at this if you want to develop some idea and understanding of Deep Learning concepts in less than 25 days without any prior knowledge.

FRAMEWORK TUTORIALS

You can go either for PyTorch Tutorials or Tensorflow Tutorials.

Start and excel in one of these as it is the implementation that matters ultimately.

DIVING INTO DETAILS:

The above resources will help you with the fundamentals of different aspects of Data Science. But given the depth and breadth of the field, there’s always more to learn. So here are the resources you can refer to if you want to explore various aspects such as statistics, computer vision, natural language processing, etc.

Stats 110(Harvard)
50 Challenging Problems in Probability
Deep Learning Book
NOC CS20-73: NPTEL Introduction to Machine Learning
CS 231n: Convolutional Neural Networks for Visual Recognition (Strongly Recommended)
CS 224n: Deep Learning for Natural Language Processing (Strongly Recommended)

Note: If you have any doubts, you can refer to blogs on Analytics Vidhya and Medium.

Remember, Google is your best friend! :D

This is certainly not the end of things and is probably the start of all the topics one can dive into and explore after completing the above path.

[Edit 1]: It is suggested to start ML before DL. But it is completely a personal choice and if you feel interested in Deep Learning, you may start that. Following the order of the blog is just a suggestion and not really necessary! :)

Thanks to Dhruv Kalsotra.