Data Scientist Learning Path 2019

Machine Learning Series!!!

Hi Folks, So many have asked me basic question always, “Can you please suggest me the best path for become data scientist ? ”. When I was beginner so many website help me to learn the data science.Special thanks to , AndrewNG Course, and Jason Brownlee which made me give good insight of data science.So, Today I am sharing here to learning path for become to better data scientist.

Must Suggested Special Course of Machine Learning by Google Click Here

Step 1 : Basic Python Learning

Choose one language for machine learning is must necessary and I suggest python because it is most popular language in Data can also go with R.But I suggest python to learn don’t worry.I am sharing here both language learning resources.

Python Resources to Learn

  1. Learn Python for Data Science — Online Course | DataCamp
  2. Basic Python by Corey Schaftr
  3. Books (mandatory) — Python for Data Analysis — This book covers various aspects of Data Science including loading data to manipulating, processing, cleaning and visualizing data. Must keep reference guide for Pandas users.

R Resources to Learn

  1. R Programming by Newboston
  2. R by
  3. R By Datacamp
  4. Books — R for Data Science — This is your one stop solution for referencing basic materials on R.

Step 2 : Develop Skills in Algebra, Statistics, and ML

Data scientists are better at statistics than any software engineer and are better at software engineering than any statistician. Our idea is to maintain the right balance and avoid too much or not enough to emphasize either.

Descriptive Statistics — 1 week

Probability — 2 weeks

Inferential Statistics — 2 weeks

  • Course (mandatory) — Intro to Inferential Statistics from Udacity — Once you have gone through the descriptive statistics course, this course will take you through statistical modeling techniques and advanced statistics.
  • Books (optional) — Online Stats Book — This online book can be used for a quick reference for inference tasks.

Linear Algebra — 1 week

  • Course (mandatory)
  • Linear Algebra — Khan Academy : This concise and an excellent course on Khan Academy will equip you with the skills necessary for Data Science and Machine Learning.

Books (optional)

Structured Thinking — 2 weeks

Competitions (mandatory): No amount of theory can beat practice. This is a strategic thinking problem which will test you on your thinking process. Also, keep an eye on business case studies as they help in structuring your thoughts tremendously.

Step:3 Python Packages Pandas,numpy, matplotlib, scikit learn, bokeh

This is where fun begins! Here is a brief introduction to various libraries. Let’s start practicing some common operations.

  • Practice the NumPy tutorial thoroughly, especially NumPy arrays. This will form a good foundation for things to come.
  • Next, look at the SciPy tutorials. Go through the introduction and the basics and do the remaining ones basis your needs.
  • If you guessed Matplotlib tutorials next, you are wrong! They are too comprehensive for our need here. Instead look at this ipython notebook till Line 68 (i.e. till animations)
  • Finally, let us look at Pandas. Pandas provide DataFrame functionality (like R) for Python. This is also where you should spend good time practicing. Pandas would become the most effective tool for all mid-size data analysis. Start with a short introduction, 10 minutes to pandas. Then move on to a more detailed tutorial on pandas.
  • Check out DataCamp’s course on Pandas Foundations

You can also look at Exploratory Data Analysis with Pandas and Data munging with Pandas

Additional Resources:

  • If you need a book on Pandas and NumPy, “Python for Data Analysis by Wes McKinney”
  • There are a lot of tutorials as part of Pandas documentation. You can have a look at them here

Assignment: Solve this assignment from CS109 course from Harvard.

Step : 4 Exploration and Visualization

1. R Programming


  • Exploratory Data Analysis — This is an awesome course by Johns Hopkins University on Coursera. You will need no other course to perform visualization and exploratory work in R.


  • Comprehensive guide to Data Exploration in R — This will be a one-stop article that I will suggest you to go through carefully and follow every step. This is because the steps mentioned in the article are the same steps you will be using while solving any data problem or a hackathon problem.
  • Cheat sheet — Data Exploration in R — This cheat sheet contains all the steps in data exploration with codes. I suggest you to take out a print and paste it on your wall for quick reference.

2. Python

Course (optional)

  • Intro to Data Analysis — This is an excellent course by Udacity on Data Exploration using Numpy and Pandas.

Blogs/Articles (mandatory)

Books (optional) — Python for Data Analysis — A one stop solution for your Data Exploration and Visualization in Python.

Step : 5 Data Preprocessing

  1. Data Preprocessing Story
  2. Data Preprocessing Tutorial by Analytics Vidhya
  3. data Preprocessing

Step:6 Feature Selection/ Engineering

Step: 7 Basic and Advance Machine learning algorithms

Basic Machine Learning Algorithms.

  • Linear Regression
  • Logistic Regression
  • Decision Trees
  • KNN (K- Nearest Neighbours)
  • K-Means
  • Naïve Bayes
  • Dimensionality Reduction

Advanced algorithms

  • Random Forests
  • Dimensionality Reduction Techniques
  • Support Vector Machines
  • Gradient Boosting Machines

Linear Regression


  • Machine Learning by Andrew Ng — There is no better resource to learn Linear Regression than this course. It will give you a thorough understanding of linear regression and there is a reason why Andrew Ng is considered the rockstar of Machine Learning.


  • This lesson out of PennState Stat 501 course outlines the main features of Linear Regression ranging from a simple definition of a Linear Regression to determining the goodness of fit of a regression line.
  • This is an excellent article with practical examples to explain Linear Regression with code.


  • The Elements of Statistical Learning — This book is sometimes considered the holy grail of Machine Learning and Data Science. It explains Machine Learning concepts mathematically from a Statistics perspective.
  • Machine Learning with R — This is a book I personally use to have a brief understanding of Machine Learning algorithms along with their implementation code.


  • Black Friday — Like I already said — No amount of theory can beat practice. Here is a regression problem that you can try your hands on for a deeper understanding.

Logistic Regression

Course (mandatory)

  • Machine Learning by Andrew Ng– The week 3 of this course will give you a deeper understanding of the one of the most widely used classification algorithm.
  • Machine Learning: Classification — Week 1 and 2 of this practical oriented Specialization course using Python will satiate your knowledge thirst about Logistic Regression.

Blogs/Articles (optional)

Books (optional)

  • Introduction to Statistical Learning — This is an excellent book with a quality content on Logistic Regression’s underlying assumptions, statistical nature and mathematical linkage.

Practice (mandatory)

  • Loan Prediction — This is an excellent competition to practice and test your new Logistic Regression skills to predict whether loan status for a person was approved or not.

Decision Trees

Course (mandatory)

Blogs/Articles (mandatory)

Books (mandatory)

  • Introduction to Statistical Learning — Section 8.1 and 8.3 explain the basics of decision trees through theory and practical examples.
  • Machine Learning with R — Chapter 5 of this book provides you the best explanation of Machine Learning Algorithms available in the market. Here, the decision trees are explained in an extremely non-intimidating and easier style.

Practice (mandatory)

  • Loan Prediction — This is an excellent competition to practice and test your new Logistic Regression skills to predict whether loan status for a person was approved or not.

KNN (K- Nearest Neighbors)

Course (mandatory)

  • Machine Learning — Clustering and Retrieval: Week 2 of this course progresses to k-nearest neighbors from 1-nearest neighbor and also describes the best ways to approximate the nearest neighbors. It explains all the concepts of KNN using python.

Blogs/Articles (mandatory)




Naive Bayes


  • Intro to Machine Learning: Take this course to see Naive Bayes in action. In this course, Sebastian Thrun has explained Naive Bayes in Simple English.

Blog / Article

  • 6 Easy Steps to Learn Naive Bayes Algorithm (with code in Python) : This article will take you through Naive Bayes algorithm in detail. In this guide, you will learn how Naive Bayes algorithm works, applications and many more. It will also give you hands-on knowledge of building a model using Naive Bayes.
  • Naive Bayes for Machine Learning : This is one of the most comprehensive articles I have come across. Go through this article to have a complete understanding of why naive bayes algorithm is important for machine learning.

Dimensionality Reduction


Blog / Article

Random Forests

Books (optional)

Blogs/Articles (mandatory)

Gradient Boosting Machines

Blogs/Articles (mandatory)

Presentation (mandatory): Here is an excellent presentation on GBM. It contains the prominent features of GBM and the advantages and disadvantages of using it to solve real-world problems. It is must see article for somebody trying to understand GBM.


Blogs /Articles (mandatory)

  • Official Introduction XGBOOST — Read the documentation of hackathons winning algorithm. It is an improvement over GBM and is right now the most widely used algorithm for winning competitions.
  • Using XGBOOST in R — An excellent article on deploying XGBOOST in R using a practical problem at hand.
  • XGBOOST for applied Machine Learning — An article by Machine Learning Mastery to evaluate the performance of XGBOOST over other algorithms.

Support Vector Machines

Course (mandatory)

Books (mandatory)

Blogs/Articles (optional)

Step : 8 Profile Building on Github and Participation in Competition

It is very important for a Data Scientist to have a GitHub profile to host all the codes of the project he/she has undertaken. Potential employers not only see what you have done, how you have coded and how frequently / how long you have been practicing data science.

Also, codes on GitHub open up avenues for open source projects which can highly boost your learning. If you don’t know how to use Git, you can learn from Git and GitHub on Udacity. This is one of the best and easy to learn course to manage the repositories through terminal.

  1. Analytics Vidhya Datahack
  2. Kaggle competitions
  3. Crowd Analytix human layer
  4. DrivenData

Step: 9 Learn Some Advance Algorithm of Machine Learning

There are a few specific machine learning algorithms, which come in handy while solving specific problems. For example, try solving online click prediction on large data sets with out applying online learning algorithms and you would know what I am talking about. Here are a few advanced ML algorithms you should learn this month:

Online Machine Learning

Course: Online Methods In Machine Learning by MIT


Blogs : Langford’s

Vowpal Wabbit

FTRL- Algorithms

Exercise: Practice on one of the old Kaggle competitions or open click through rate data sets as provided by Criteo.

Step : 10 Deep Learning Basics & Advanced

Deep Learning Basics (May 2017 — June 2017)

Course (mandatory)

  • Machine Learning by Andrew Ng — There is no better introductory material to Deep Learning and Neural Networks than Week 4 and Week 5 material of this course.
  • Deep learning by Google | Udacity — This is an excellent basic course on transition from Machine Learning to Deep Learning, deep neural networks, Convolutional Neural Networks and Deep Learning for texts.

Reading Material/Books

  • Deep learning Textbook — Written by people like Ian Goodfellow, Yoshua Bengio and Aaron Courville, this book is bound to become the de-facto for people trying to learn Deep Learning.
  • Stanford Deep Learning tutorial — This is an all text and images resource provided by Stanford which starts from Linear Regression and goes to Convolutional Neural Networks with ease.

Practice — Identify the digits — An awesome contest to check the basics you have learned to identify handwritten digits.

Deep Learning advanced (June 2017 — August 2017)

Course (mandatory)

Specialization Material

Deep Learning for Natural Language Processing

Deep Learning for Speech/Audio

Step 11 : Reinforcement Learning

Topics to be covered: Reinforcement Learning (Theory)


Code Reinforcement Learning Introductory Codes[Code]


References :