Data Science From Scratch [curriculum with 20 free online courses]

Learn how to become a data scientist in 8 steps with these free resources

A reader recently asked me “How can I learn to become a data scientist?”
(Yes, I actually have a reader. Who woulda thunk it?)

Great question, Ian, especially considering the field is red-hot right now and shows no signs of slowing down!

In 2018 demand for data scientists grew by 29%. This highlights the 344% increase since 2013 according to reports by Indeed and Dice. The supply of qualified candidates, however, drastically lags behind.

Photo Credit: Anastasiia Kornilova

My first instinct was to refer Ian to the contemporary King of AI, Siraj Raval, and his free curriculum “Learn Data Science in 3 Months”. The video, however, was created six months ago (a lifetime in the ever-changing world of data science) and I saw some opportunities to update it.

  • Some courses were gone or had changed… so I updated/replaced them.
  • Some prefer a written guide… so I typed it out.
  • Some need to know why a topic is important to fully understand it… so I added brief explanations
  • Finally, (based on personal experience) some can get stuck on any single course no matter how interesting it is. I always like to have an additional course available that may explain something a different way or fill in some knowledge gaps… so I added some alternatives/additions.

Hopefully, this complete curriculum to become a data scientist helps Ian and anyone else interested in the field!

1. Learn Python

Tools you’ll use?

Python

Why is it important?

Python is growing faster than any other language, it’s extremely well documented, and most of the tools and resources are free. 
Can you be a data science without knowing Python? Sure. 
Just like you can drive a car without eyesight. You just won’t get very far.

How to learn it?

Massachusetts Institute of Technology (MIT) | Introduction to Computer Science and Programming in Python
Kaggle | Python
Siraj Raval | Learn Python for Data Science

2. Learn Statistics and Probability

Tools you’ll use?

Math

Why is it important?

As a data scientist, you’ll have to extract useful information from extremely imperfect data. You can’t completely eliminate uncertainty but you can reduce it with a strong grasp of statistics and probability fundamentals.

How to learn it?

Khan Academy | Statistics and Probability
UC San Diego | Probability and Statistics in Data Science using Python

3. Learn Data Analysis

Tools you’ll use?

Pandas, R

Why is it important?

Data analysis enables you to summarize the characteristics of a data set. This deeper understanding of the data can direct you to the best way to extract useful, actionable conclusions. 
In short… learn how to understand and clean data. It’s what 90% of your time will be spent doing.

How to learn it?

Georgia Tech | Computing for Data Analysis
Kaggle | Pandas

4. Learn Algorithms and Machine Learning

Tools you’ll use?

Pandas, scikit-learn

Why is it important?

This is likely why you got into data science in the first place! Use Skynet to draw conclusions from the data we mere humans never could.

How to learn it?

Columbia | Machine Learning for Data Science and Analytics
Kaggle | Machine Learning

Deepfake videos created with Deep Learning

5. Learn Deep Learning

Tools you’ll use?

TensorFlow, Keras

Why is it important?

Because everyone’s talking about deep learning so you have to use deep learning always.
Alright… not exactly.
Good ol’ fashion machine learning is still the best option for most data science endeavors. Deep Learning, however, is making major breakthroughs in certain fields such as image recognition, automation and many more.

How to learn it?

Udacity | Intro to Relational Databases 
and
Microsoft | Intro to NoSQL
Kaggle | SQL

6. Learn Relational Databases

Tools you’ll use?

SQL, DB-API, NoSQL

Why is it important?

As a data scientist chances are good you’ll need to access some data. Equally likely is the fact that that data will be stored in databases. Might be a good idea to learn your way around them.

How to learn it?

Udacity | Intro to Relational Databases 
and
Microsoft | Intro to NoSQL
Kaggle | SQL

7. Learn Distributed Computing for Big Data

Tools you’ll use?

Hadoop, MapReduce, Spark

Why is it important?

2.5 quintillion bytes of data are created every day. Let me repeat that… 2,500,000,000,000,000,000 bytes. That’s 2,500 with 15 extra 0’s. 
If every byte were a single penny, and we laid them all flat, they’d cover the entire Earth… five times. 
How does a data scientist actually process that kind of data? By filtering and sorting it (MapReduce) and distributing that work over clusters (Hadoop / Spark).

How to learn it?

Udacity | Intro to Hadoop and MapReduce
 Stanford | Intro to Apache Spark

8. Learn Data Presentation and Storytelling

Tools you’ll use?

Matplotlib, Seaborn, Folium, Excel, PowerPoint

Why is it important?

If a tree falls in the woods, but no one’s there to hear it, does it make a sound? 
What if useful insight is extracted from data, but no one understands it enough to take action, does it serve a purpose?
Not really. Data science is useless if the results aren’t actionable. You have to be able to show not just what the data says but why it matters and what should be done about it.
 An average data scientist with outstanding presentation skills will almost always produce more useful results than the best data scientist who can’t explain them.

How to learn it?

IBM | Visualizing Data with Python
Microsoft | Analytics Storytelling for Impact
Kaggle | Data Visualization


About the Author:

Matthew Bardeleben is the founder of technology consulting and digital reputation management firm Learn. Disrupt. Profit. Repeat. He has earned more than 35 certifications in topics ranging from artificial intelligence, blockchain, and Python programming to digital marketing, growth hacking, and UX/UI design from organizations such as IBM, Google, and HubSpot.

Web | Blog | LinkedIn | Twitter | Instagram