Zero to Data Scientist: The Curriculum

On the 1st of November 2017, I set out to do something that I’ve always dreamed of doing; become a data scientist.

A brief history

Data science is something that I wanted to do even before I knew the term.

I grew up on Asimov’s stories of mechanical men and was intrigued by the idea of making intelligent things. As I got older, I taught myself the Python programming language, studied engineering and built intelligent physical systems.

It turns out that the thing that I was truly interested in was the software that ran the systems, the ghost in the machine.

I have since spent many months learning as much as I could about modern software engineering and now I want to embark on a new and exciting journey.

What is Data Science anyway?

The term “data science” means a hundred different things to a hundred different people, as characterized by the disparity and diversity of the data science role from company to company. Also, in the wild, data science applications can be found across various industries; from Facebook’s news feed to Tesla’s autonomous vehicles.

However, there are some common threads linking all these definitions, skill sets and job descriptions together, the most prominent of which is that they all involve learning from data. This holds whether the process is a manual one or is automated, involves plotting a graph or running convolutional neural nets.

As I see it, all these topics and ideas simply fall into a data science spectrum, where the simpler, more manual processes are more to the left and the more complex, automated and resource intensive processes are further to the right.

My aim is to gradually move from left to right, simple right?

The Open Data Science Curriculum

This curriculum is based on many nights of googling, reading articles, listening to podcasts, and following the advice of friends that I trust, but this is by no means the final draft. I shall make changes to it as I go along.

If you have spent even just a few minutes trying to figure out how to start out in data science, you must have noticed that there are generally two camps. The first camp says you should go through a long list of prerequisites, usually math and stats, before writing a single line of code, while the other camp says you should wing it. I favor the latter as a huge part of getting good at anything in life is practice. The theory behind the software can be learned as you go along.

Without further ado:

Courses

  1. Learn To Code For Data Analysis by FutureLearn
    I think this is an excellent place to start your data science journey, especially if you have no prior coding experience. It is a four week course that teaches you the mechanics of data analysis with python, using real life data sets. The best part about it is that it is free!
  2. Data Scientist with Python/R by DataCamp
    This takes the form of an interactive programming environment. I generally like the gamification of learning up to a certain point and DataCamp is excellent at toeing that line. You can follow a number of career or skill tracks in Python or R, and there a few data sets for you to get some practice with. It has a few free courses but a subscription is required to get it’s real value.
  3. Machine Learning by Andrew Ng (Cousera)
    This is a highly rated course on machine learning delivered by a leading figure in the field. I particularly like this course because it delves into some of the math and the theory behind the software. The only downside is that it is delivered in Octave/Matlab, but I would strongly suggest taking it.
  4. Machine Learning Specialization by University of Washington (Cousera)
    This specialization was highly recommended because it is a deep dive into certain topics such as regression, classification, clustering and retrieval. The best part for me is that it is delivered in python. The specialization, with four courses in total lasting about six weeks each, is quite lengthy so come prepared.
  5. Machine Learning A-Z™: Hands-On Python & R In Data Science (Udemy)
    I have only heard good things about this course and it seems like a great addition to the curriculum. I like this udemy course because it is very hands-on, with the instructor running through a good number of concepts through numerous coding examples in both R and Python.
  6. Deep Learning Specialization by Andrew Ng (Cousera)
    This an advanced course on the cutting-edge tech of deep learning by Andrew Ng. In this specialization you will learn about convolutional networks, RNNs, LSTM and a lot more. There are five courses with each course taking about three to four weeks to complete.
    Try it out when you are ready to up your game.
  7. Practical Deep Learning For Coders, Part 1 and Cutting Edge Deep Learning For Coders, Part 2 by fast.ai
    fast.ai has more applied courses where the emphasis is on implementation of concepts and learning just what you need to move on to the next step. I have seen a good number of their graduates and the projects that they worked on, and I am impressed. It seems like this would be a great way to wrap up the curriculum.

Books

These are really high quality books that could serve as references as you progress through the curriculum.

  1. Python for Data Analysis by Wes McKinney
  2. Data Wrangling in Python by Jacqueline Kazil and Katharine Jarmul
  3. Statistics in a Nutshell by Sarah Boslough and Paul Andrew Watters

Lectures

A little bit of theory isn’t a bad thing. Here are a few lectures to give you some perspective.

  1. MIT 18.06 Liner Algebra
  2. MIT 6.034 Artificial Intelligence
  3. MIT 6.041SC Probabilistic Systems Analysis and Applied Probability
  4. Introduction to Databases by Jennifer Widom (Stanford Online)

Wrapping it Up

Although this list might seem unreasonable, I believe that nothing great can ever be achieved by sitting in ones comfort zone; it’s the struggle that defines us.

I intend to go through if not all the resources on this list then the vast majority (80%) by the end of 2018.

If you found this list intriguing and have similar goals, send me a message and we could learn together. As always, I’m open to suggestions on ways to improve the curriculum. I am @adinoyisadiq on twitter.

Stay tuned for subsequent posts as I work through the courses. You won’t want to miss them. Until then.

Happy New Year!