AI Nanodegree Program Syllabus: Term 2 (Deep Learning), In Depth

Luis Serrano
Udacity Inc
Published in
9 min readApr 19, 2017

Here at Udacity, we are tremendously excited to announce the kick-off of the second term of our Artificial Intelligence Nanodegree program.

Why are we excited? Because we are able to provide a depth of education that is commensurate with university education; because we are bridging the gap between universities and industry by providing you with hands-on projects and labs, and partnering with the top industries in the field; and last but certainly not least, because we are able to bring this education to many more people across the globe, at a cost that makes a top-notch AI education realistic for all aspiring learners.

During the first term, you’ve enjoyed learning about Game Playing Agents, Simulated Annealing, Constraint Satisfaction, Logic and Planning, and Probabilistic AI from some of the biggest names in the field: Sebastian Thrun, Peter Norvig, and Thad Starner.

Now, what’s next? Term 2 will be focused on one of the cutting-edge advancements of AI — Deep Learning. In this Term, you will learn about the foundations of neural networks, understand how to train these neural networks with techniques such as gradient descent and backpropagation, and learn different types of architectures that make neural networks work for a variety of different applications.

After this, you’ll choose a concentration in either Voice User Interfaces, Natural Language Processing, or Computer Vision. We are super excited to tell you that we’ve teamed up with some of the biggest names in industry—Amazon, IBM, and Affectiva—who will be happy to guide you through their innovations in these fields. Among other experts, your instructors will include Ashwin Ram, Senior Manager of the Alexa team at Amazon, Armen Pischdotchian, Academic Tech Mentor for IBM Watson Solutions, and Rana el Kaliouby, the CEO and co-founder at Affectiva!

You can apply to be a part of our AI Nanodegree program here! Our AI Nanodegree program is a 6-month project-based program covering both classical and modern techniques in Artificial Intelligence.

So let’s get started! Here is our first 3-month curriculum in depth, including all the projects and labs that you’ll be building.

Core program

  • Deep Neural Networks
  • Convolutional Neural Networks
  • Recurrent Neural Networks

Concentrations

  • Voice User Interfaces — with Amazon Alexa
  • Natural Language Processing — with IBM Watson
  • Computer Vision — with Affectiva

Core Program

Introduction to Deep Neural Networks

Neural Networks are the fundamental building block of Deep Learning. In this class, you’ll learn their structure, and how to build them and train them. We’ll start the class by solving linear classification problems using perceptrons and logistic regression. From there, we’ll extend to solving highly non-linear classification problems using deep neural networks. We’ll learn about their architecture, how to train them using techniques such as gradient descent and backpropagation, and how to optimize the training process using different error functions and regularization techniques.

Lab: Building Neural Networks in Keras

You’ll finish this lesson with a coding lab where you’ll have the chance to apply all these concepts in Keras, one of the most popular Deep Learning packages in Python.

Convolutional Neural Networks

Convolutional Neural Networks are widely used for image classification, among many other exciting applications. In this section you’ll learn how they differ from ordinary neural networks. After exploring some new types of layers, we’ll explore several different network architectures. You’ll discover how to augment your data to improve performance, before learning techniques to visualize what your networks learn. You’ll learn to tap into the power of transfer learning, by benefitting from pre-trained networks such as VGGNet or ResNet.

Project: Build a Dog Recognition App

You’ll apply what you’ve learned to build an end-to-end algorithm to process any user-supplied image. Given an image of a dog, your algorithm will identify an estimate of the canine’s breed. If supplied an image of a human, the code will identify the resembling dog breed. What kind of dog will your algorithm think you look like? :)

Recurrent Neural Networks

Recurrent Neural Networks are used for some very exciting applications, such as time series predictions and sequence generation. Their architecture is different from feedforward neural networks, as the information cycles through the network, allowing it to remember previous states.

Project — Part 1: Predicting Apple Stock

You’ll apply what you’ve learned to build an algorithm which will perform time series predictions, and will forecast the stock price of Apple 7 days in advance.

Project — Part 2: Generating Sherlock Holmes text

You’ll also implement a recurrent neural network to create an English language generator, which will build semi-coherent English sentences from scratch. You’ll be able to test it by feeding it with the text of Sir Arthur Conan Doyle’s classic book, The Adventures of Sherlock Holmes, in order to obtain text that resembles a Sherlock Holmes novel! Elementary, my dear IBM Watson! :)

Concentrations

Concentration 1: Voice User Interfaces with Amazon Alexa

In this concentration, you’ll learn how computers can process speech, turn it into text, and vice versa. In the first part, you’ll get an overview of Voice User Interfaces (VUI), focus on Conversational AI, and learn how Alexa operates. Then, you’ll dive deeper into the exciting field of Speech Recognition, learning Signal Analysis and Phonetics, single word classification using Dynamic Time Warping, and sentence recognition using Hidden Markov Models. Finally, you’ll learn about the cutting edge in Automatic Speech Recognition, leveraging deep neural networks.

Amazon Alexa Lab — Building an Alexa Skill

In this lab, you’ll get your hands on the the Alexa Skills Kit, where you’ll build and deploy your first Alexa Skill, which will carry on a small conversation with the user and provide a fact from a given year.

Capstone Project — Speech Recognition with Neural Networks

In this project, you will build a deep neural network that functions as part of an end-to-end automatic speech recognition (ASR) pipeline! Your completed pipeline will accept raw audio as input and return a predicted transcription of the spoken language. The full pipeline is summarized in the figure below.

First, you’ll preprocess the raw audio to convert it to one of two feature representations that are commonly used for ASR. Then, you will learn about the basic types of neural networks that are often used for acoustic modeling, and be able to build your own acoustic model! This model will be used as input into the decoder, which will return the predicted transcription.

Concentration 2: Natural Language Processing with IBM Watson

Natural Language Processing (NLP) is a fascinating field in which we teach computers to understand and analyze text. In this concentration, you’ll learn to decompose a problem that involves analyzing natural language text into tasks, perform fundamental NLP operations, such as building an N-gram language model from a given corpus, and labelling words in a sentence with Part-of-Speech (POS) tags and as named entities. You will accomplish end-to-end NLP tasks such as document classification, machine translation, etc., using a combination of custom processing and cloud-based APIs.

IBM Watson Lab — Question-Answering Agent with Watson

In this lab, you will build a simple question-answering agent that is able to learn from any text data you provide, and answer queries posed in natural language. You will use IBM Watson’s cloud-based services to process the input text data and find relevant responses.

Capstone Project — Machine Translation

In this project, you will build a deep neural network that functions as part of an end-to-end machine translation pipeline. Your completed pipeline will accept English text as input and return the French translation.

First, you’ll preprocess the data to convert text to a sequence of integers. Then, you’ll create models which accept a sequence of integers as input, and return the probability distribution over possible translations. After learning about the basic types of neural networks that are often used for machine translations, you will be able to engage in your own investigations, to design your own machine translation model that you can run on English text!

Concentration 3: Computer Vision with Affectiva

Inspired by human vision, Computer Vision aims to give machines the ability to see and interpret the world by extracting information from images. In this concentration, you’ll learn the fundamentals of computer vision and the role it plays in artificial intelligence systems. Computer vision is used in many applications, from detecting skin cancer to emotion recognition (and even self-driving car navigation!). Throughout this term, you’ll develop practical skills and get hands-on coding experience with many of these real-world applications. You’ll learn to break down any problem that involves visual perception into computer vision tasks, such as: enhancing images, applying color and geometric transformations to change the appearance of an image, detecting object boundaries, computing gradients and filtering images, and extracting features like object edges and unique visual patterns. We’ll go over these foundational computer vision techniques in detail. Then, you’ll utilize what you’ve learned to create a complete AI system that uses computer vision to perform smart object detection and activity recognition.

Affectiva Emotion API Lab — Mimic Me!

In this lab, you’ll learn to track faces in a video and identify facial expressions (joy, sadness, surprise, etc.) using AffedexMe from Affectiva. As a fun visualization, it will be up to you to tag each identified face with its appropriate emoji! Then, you’ll turn this into a game where a player needs to mimic a random emoji displayed by the computer.

Capstone Project — Facial Keypoint Detection

In this project, you’ll combine your knowledge of computer vision techniques and deep learning to build an end-to-end facial keypoint recognition system! Facial keypoints include points around the eyes, nose, and mouth on a face and are used in many applications, from facial tracking to emotion recognition.

This project is broken into a few main parts. First, you’ll use your knowledge of computer vision and OpenCV to detect faces. You’ll also use image filtering techniques to de-noise images and transform the appearance of a face in an image. Second, you’ll define and train a Convolutional Neural Network (CNN) to detect facial keypoints. Finally, you’ll put these two parts together so that you can reliably identify facial keypoints on any image!

You’ll also be given optional exercises that guide you in extending this project to video (or a laptop camera feed); this gives you the ability to implement fun face filters and keypoint detection in real time!

Students in this program will master an in-depth curriculum that covers both the foundations and the cutting-edge advancements of Deep Learning. This kind of hands-on exposure to one of the most transformational technologies of our time makes this program a unique learning opportunity for anyone excited by the possibilities of AI. We invite you to join us as we prepare the next generation of AI experts who will power the amazing innovations ahead!

Feel free to reach out to us if you have any questions, and see you in the program!

Apply now to join the AI Nanodegree program!

--

--

Luis Serrano
Udacity Inc

Author of Grokking Machine Learning. AI Scientist and Popularizer. YouTuber. Ex Google, Apple, Udacity. PhD in Mathematics