How to get a Master’s in Data Science using Nanodegree Programs?

12 min readSep 5, 2020

Can you get an education that is equivalent to a Master’s in Data Science from Stanford, Harvard, Columbia, or Berkley using Nanodegree Programs? Turns out, yes! Well, now you can.

2 years ago, I got inspired by David Venturi to curate my Master’s in Data Science using online courses. At the time, there weren’t enough Nanodegree Programs.

However, as I near completion of my curated path, there are enough programs to cover the core and elective courses of a typical Master’s curriculum. You can learn it all, at a fraction of the cost.

So, I thought I’ll redesign mine if I were to start from scratch.

About Me

It’s disheartening. We are often told that our minds are the way they are, and there is no changing it. Some people are just good at math. It’s a gift. Some are just bad at math. That’s how it is, and always will be. However, my journey has taught me differently.

Hi, I’m Rishabh Chopra. I was a Commerce student pursuing Film Making. Yet, I developed an interest in Computer Science when I took Udacity’s Intro to Programming Nanodegree. My first line of HTML had me hooked. I jumped out of my seat when I saw how a few lines of my code had made a web page!

Since then, I have completed five Nanodegree Programs pursuing my curated Data Science Masters. Given the flexibility of online learning, I joined Udacity as a Learning Coach last year. I now work as a Program Experience Manager for the School of Data Science.

Motivation

I am writing this post for 3 reasons.

I hope this post inspires you to start your learning journey. It has been an empowering experience for me.
I want to share some of the mistakes I made.
I want to push myself to finish the remaining courses. I thought posting about it may add a bit more accountability to me.

The Process

I studied the curriculum of Berkeley, Stanford, Columbia, and Harvard. I noted the concepts taught in each of their courses. I researched how the concepts taught will be useful for a Data Scientist. Then, I mapped each course to a Nanodegree Program. When there were concepts that went beyond the scope of the program, I used books.

This curriculum is divided into 10 sections containing the core, optional, and the electives.

Now, a master’s degree is good for three things:

A Structured Learning Path
The Accountability
The Projects

This post solves for the Structured Learning Path. So, what about the rest?

Enter, Nanodegree Programs

When starting off, my curriculum had a mix of online resources. However, I found myself looking for a Nanodegree for each subject I was learning. Maybe I was just used to them. The 2 biggest reasons were:

The Accountability: With a combination of Mentorship, Project Reviews, and Less Passive Learning (I’m not a fan of long lectures), it was easier to discipline myself.
The Projects: Each Nanodegree provided me a convenient way to showcase a set of skills in each subject.

Want to put your data wrangling skills to the test? There’s a project for that.
Want to show off your knowledge of Statistics? There’s an A/B Testing Project for that.
Want to create and deploy deep learning models on the web for everyone to use? You get the idea.

By the end of this curriculum, you would have built 41 Projects that include 5 Capstone Projects.

Why use books?

Books for Master’s In Data Science using Nanodegree Programs

Nanodegree Programs have a top-down approach. You directly get your hands dirty, coding. This is exciting. It makes me wonder about all the applications that I’ll be able to build.

But I have to admit. I’m a nerd. I like to understand the nitty-gritty details of how things work. For that, I use books.

A good reason for reading books is to add more hooks.

Let me explain. Your memory is like a Velcro. A velcro has 2 sides. One is covered with hooks and the other is covered with loops. On pressing the 2 sides together, a large number of hooks get attached inside the loops and the velcro seals. If there aren’t enough hooks, the velcro won’t seal.

Reading books will give you a different explanation of a concept you learned in the Nanodegree. Hence, adding more hooks.

Core Curriculum

If you are not interested in reading the rationale behind my choices, you can go to this sheet. It provides the list of courses, the skills taught, and their duration.

1. Introduction to Computer Science for Data Science

Introduction to Programming Nanodegree
Programming for Data Science Nanodegree
Book: Python Crash Course by Eric Matthes
Additional Resource: The Missing Semester

Skill Map — Intro to Computer Science for Data Science

First, master fundamentals. Then, play infinite games. — James Stuber

Before getting into Data Science, this curriculum focuses on a solid computer science foundation. The Introduction to Programming Nanodegree will introduce you to Python — the primary language of Data Science in this curriculum. You will also learn the front-end languages, i.e. HTML, CSS, and JavaScript. It’ll prove useful while scraping the web or deploying web apps.

While doing IPND, I used the book by Eric Matthes. As a newbie, it was helpful to get some extra practice with Python syntax while building a 2D “Alien Invasion” Game.

In the Programming for Data Science Nanodegree, you will learn SQL, Data Analysis Process in Python, Command-Line Essentials, and Version Control using Git.

The Missing Semester will teach you the much-neglected part of CS education covering debugging, profiling, security, and cryptography.

2. Data Structures and Algorithms

Skill Map — Algorithms and Data Structures

Columbia teaches CSOR W426: Algorithms for Data Science. Half of its syllabus is covered in the Data Structures and Algorithms Nanodegree. The other half teaches Machine Learning algorithms, which are covered later in this curriculum.

But, why learn algorithms and data structures on a Data Science route?

It’s True. You won’t be able to appreciate their value right off the bat when working with small-medium sized datasets. You will appreciate this knowledge when your data is LARGE.

The process of data analysis and machine learning involves gathering the data, assessing it’s quality and tidiness, cleaning it, analyzing it, and then using ML models to make predictions with it. You see?

That’s a lot of code!

So, it sure helps if you can spot inefficiencies in your code, or make better choices among data structures. You’ll also get to better understand what’s happening under the hood of many ML frameworks.

The program covers enough for your work as a Data Scientist. For other advanced algorithms like K-D-Trees (for geo-spatial data) or Run-Length Encoding (for decoding compressed datasets), I recommend the book, Introduction to Algorithms.

3. Introduction to Data Science

Data Analyst Nanodegree
Business Analyst Nanodegree
Book: Web Scraping with Python by Ryan Mitchell
Additional Resource: Introduction to Databases

Skill Map — Introduction to Data Science

The Data Analyst Nanodegree is in line with Berkeley’s Introduction to Data Science Programming or Harvard’s AC209A Introduction to Data Science. It’s a comprehensive introduction to data science. It touches each point of the Data Analysis Process, from gathering data to communicating findings.

The Practical Statistics course in DAND covers the same material as Columbia’s STAT GR5701 or Berkeley’s Statistics for Data Science. It’s my favorite program. I specifically enjoyed the Data Wrangling Project. Yes, I’m a dog person.

Best Pupper Rating! 1777/10

In the Business Analyst Nanodegree, you’ll learn Excel and Tableau. When working with other stakeholders, Excel establishes a common language with people who don’t code.

Tableau adds an aspect of Data Visualization to the curriculum. Berkeley teaches a Data Visualization course which covers many tools. Want to add more tools to your visualizations toolbox? I suggest you the following free courses:

Finally, databases are a crucial component of every software system that maintains some amount of persistent data. Stanford University’s Introduction to Databases is a wonderful course for learning database theory and design. Jennifer Widom is precise in her teaching.

On Capstone Projects

It’s very different when you try to solve problems that are not solved yet. It’s where real learning happens. It works as a proxy for experience.

Berkely and Harvard(AC 297r) have one Capstone Project in their Data Science Masters. In this curriculum, you will have a total of 5 Capstone Projects for different programs.

The capstone projects are about integrating the knowledge accumulated throughout the Nanodegree Program. It’s about doing original analysis to solve real-world problems.

Doing projects with starter code is relatively easy. Building projects from scratch is where you’ll stand out and hone your skills. So, go all out. Show that you’re comfortable with all the skills taught in the program.

This is the reason Web Scraping with Python was added to my curriculum. I was fascinated by the power of it. I used it for my Data Analyst Capstone Project where I scraped a job website for Data Analyst Job Postings to analyze the most in-demand skills for Data Analysts.

Spoiler Alert: Python, SQL, and MS Excel were the top 3 skills. You can view my project’s slide showing the skill distribution here.

It took me days to clean that data, but then I wrote a program to gather, assess, clean, and analyze job data automatically.

How to find unsolved problems?

Kaggle (for ML problems)
Job Postings — Why are people hiring a Data Scientist?

Mistake #1: Share Your Work

I’m guilty. I went all out, spent days on my capstone projects, and then never published my findings. I was learning under a stone.

A master’s degree provides you with many networking opportunities. That’s why, I suggest you release a blog post for each section of this curriculum. You can talk about your capstone project. You can even teach what you have learned. There’s always someone who knows less than you.

But I understand your objection. Writing, and publishing blogs takes time. However, it’s worth it. Writing online creates leverage. The more you show your work, the more lucky you get. It helps you connect with others and that’s something I’ve only recently learned. Hence, this blog post.

4. Mathematics of Machine Learning

Before moving on, I suggest getting a basic understanding of Linear Algebra, Calculus, Probability, and Statistics.

Given I was from a non-tech background, mathematics was a clear obstacle for me. I hated math. I promise.

However, this time learning math wasn’t just a weight lifting session for the brain. Like it was in school. This time, each concept I was learning had a purpose. I saw it being used in actually useful applications — for Customer Segmentation, Image Classification, or even a Self-Driving Car!

I hope you too will be able to appreciate Math’s beauty after learning Machine Learning’s applications. Here are the resources I used.

Linear Algebra

2. Calculus

3. Statistics and Probability

Book: Math for Machine Learning

After completing the above curriculum, you’ll know enough to be dangerous as a Data Analyst. Now, let’s get into Machine Learning.

5. Machine Learning

Stanford’s 315a and 315b follow the book, Elements of Statistical Learning. However, most people find it math-heavy so I’ve recommended it’s little sibling, Introduction to Statistical Learning. It’s a better read. It provides a good theoretical understanding to go with your Introduction to Machine Learning Nanodegree. This is where you’ll practice building projects using Supervised, Unsupervised, and Deep Learning. The program’s syllabus is closest to Berkeley’s Applied Machine Learning course.

I used the book by Aurélien Géron extensively while completing my MLND. It combines just enough theory with a mini project for each algorithm. Just coding along will make you quite familiar with NumPy, Pandas, Matplotlib, Scikit Learn, Keras, and Tensorflow. It proved really useful for my capstone project!

(Optional) Deep Dive into Mathematics

Doing the Intro to MLND may provide you enough motivation to deep dive into the Math.

Note: If you’re able to solve problems you care about, it’s completely okay to skip this part. This is only for the love of math.

I recommend the following resources now, as they are more challenging and time-intensive, but hugely rewarding.

Linear Algebra

2. Calculus

3. Probability and Statistics

Mistake #2: Not Getting into MLOps

Don’t let your ML Models Die in Jupyter Notebooks

Some people (including past me) run after the shiny new framework and the new cutting edge deep learning model. We ignore software engineering. We forget about the fundamentals and just want to focus on building ML Models that live and die in Jupyter Notebooks.

But if you’re anyone like me, you’ll realize that you want to use ML algorithms in production to build applications that provide value to others. Cristiano Breuel defined ML Ops well.

ML Ops is a set of practices that combines Machine Learning, DevOps, and Data Engineering, which aims to deploy and maintain ML systems in production reliably and efficiently.

The rest of the curriculum focuses on this part.

6. Software Engineering

Why learn back-end development?

Combining the front-end and back-end knowledge will enable you to build seamless web applications that use ML at its core.

Specifically, Course 4 of Full Stack Web Developer Nanodegree will teach you Server Deployment and Containerization — the essential DevOps skills required for MLOps.

Why learn software engineering?

As a Data Scientist, you’ll likely share code across the organization. You might build models to be used in the product. Your insights can influence important business decisions. So, you can learn a few tricks from software engineers on how to write highly reusable, robust, and maintainable code.

The Software Testing and Software Debugging courses teach you some best practices. Other best practices are covered in the Machine Learning Engineer Nanodegree.

7. Machine Learning Engineering

Ah! Finally! The best part of the curriculum. Here’s where we’ll be combining your ML and software engineering skills to build cool stuff. You’ll deploy a sentiment analysis model to the web using AWS SageMaker. You’ll build Pipelines for ETL, Machine Learning, and NLP. You’ll also build a Recommendation Engine!

8. Deep Learning

The Deep Learning Nanodegree encompasses what is taught at Stanford’s CS230: Deep Learning or Columbia’s COMS 4995: Applied Deep Learning class.

Caution: I tried to learn Deep Learning without getting a good handle on Mathematics. As a result, it seemed like a darker black box than it is. The Deep Learning Book is a handy resource for the mathematics of Deep Learning.

Use it well.

9. Data Engineering

As a Data Scientist, you may work closely with a Data Engineer. The Data Engineering Nanodegree will provide you with the required background.

In this program, you’ll learn to build ETL Pipelines to extract data from S3 buckets, stage them, and load them into your databases. You’ll also learn how to implement Data Pipelines in Airflow. Berkley also has a Fundamentals of Data Engineering course.

Intro to Hadoop and MapReduce will help you when you with some prerequisite knowledge when understanding Spark in the Data Engineering Nanodegree.

The book, Designing Data-Intensive Applications is a comprehensive deep dive into Data Engineering. It’s a good resource to build your capstone project.

10. Electives

You may choose any 2 from the 10 electives in the curriculum sheet. I suggest taking these up as you try to solve specific problems in a domain. I’ll be pursuing AI and Deep Reinforcement Learning.

Getting Ready for the Job Interviews

Here are some resources to help you prep for your interviews.

And that’s that!

I would like to thank David Venturi and Farhan Ahmad for inspiring me and helping me with my curriculum choices!

If you have any recommendations for the curriculum or would like to chat about your own educational goals, please don’t hesitate to contact me.