Being the fad that Data Science already is, there exist innumerous resources for learning it and turning you into a Data Scientist. But, some are unreal i.e. expect a lot from a beginner, and some are way too simple to make you robust to complex challenges.
Here, I propose a learning path for all the enthusiasts out there, who want to make a name in this field and are interested in learning and not just using ML.
Machine Learning is like teenage sex; everyone talks about it, nobody really knows how to do it, everyone thinks everyone else is doing it, so everyone claims they are doing it.
Before I start with the path in detail, make it clear to yourself that this will demand effort and time from your end.
Choose a Language
If both the languages given are new to you, I’d suggest you to start with Python. If you already know R, first follow this path in R. You can shift to Python later.
- Python: If you have no background in programming, start learning from https://learnpythonthehardway.org/. For someone who knows Java or C++ and wants to start Python, first complete this course by Codecademy. Next, you can go to depths with New Boston tutorials.
- R: One of the best courses for starting Data Science with R is AnalyticsEdge on edX. It helps you in learning R package, and the implementation of basic algorithms as well.
Spend 5–7 days in mastering one of these two!
Study Machine Learning
This is the most important and tough phase in becoming a Data Scientist. 90% people give up midway. Just stay determined and if you complete this, there’s no stopping you!
- Introduction to Statistical Learning by Gareth James: Bible of Statistics in Machine Learning. R users are strongly advised to complete this book religiously. It will teach you in-depths of ML Algorithms and their applications.
- Python Machine Learning by Sebastian Raschka: As the name suggests, Python people are expected to go through this to learn how to apply ML practically.
These books are self-sufficient, and would take up around 2–3 months to complete. Don’t just read, implement side-by-side!
CS109 Data Science by Harvard: This is the best course available on Data Science on the web. Trust me when I say this, if you complete this course, you can solve any DS problem. Beginner to Advanced, it’s a must for everyone.
For those who want to go more in detail, they can also look into CS229 Machine Learning by Stanford.
CS109 shall also eat up a good part of 2–3 months. Don’t leave it in between if you get irritated. Take a break, and finish it. But do complete it!
^Either go for Books or Courses at first. That decision can be yours.
“Without vision you don’t see, and without practicality the bills don’t get paid.”
It’s important to start working on some standard Kaggle problems: (attempt them in the given order)
- Titanic: Machine Learning from Disaster
- Forest Cover Type Prediction
- Bike Sharing Demand
- Telstra Network Disruptions
- Taxi Trajectory Prediction
Once you complete all of this, you are a Data Scientist! :D
Okay, wait. Don’t get carried away that easily. This is just the start of the beginning. It’s a very vast field. You can always explore more and more.
- Data Science 101
- Basic application of Probability
- Machine Learning Cycle
- Sebastian Raschka Notebooks
- Practical Problem Solved: Blogpost by yhat
- Analytics Vidhya
- Machine Learning Mastery
Focus on learning and understanding concepts. It will take a long way always!
If you have more links, don’t hesitate in sharing. :)
And, ❤ if this was a good read. Enjoy!