The Udacity NLP Nanodegree

Looking back on 2018

5 min readNov 1, 2020

I’ve heard of Bag of Words, but this is ridiculous — Photo from Burst

Looking Back in 2020

The Udacity NLP Nanodegree is the first one that I received. Now, in hindsight, I can make that distinction, as I have moved on to receiving my second Nanodegree from the Udacity program. In the past 2 years, I have leveraged the skills I learned through this program in my job as a web developer. From web scraping to word embeddings (Hey, the SEO/SEM guys were curious, and how could I not build a model?), I have seen that the skills I learned in this Nanodegree have been consistently relevant. I wanted to build more skills in this area, so I also completed the Udacity Data Science Nanodegree, where I continued to see more overlapping ideas with the NLP program: ETL, data preprocessing (Tokenizing/Lemmatizing), hypertuning model parameters, and recommendation systems. As these course have complemented each other, I’d like to discuss their distinctions a bit more.

Data Science Nanodegree

I have published two articles so far on Medium.com as a requirement for this Data Science Nanodegree. Presentation of data is incredibly important, and useful to practice for anyone in Data Science. So, check out my articles for the first project on exploratory data analysis, and the final capstone project, where I wrote predictions on whether or not a user would churn (or cancel), their subscription to a music streaming service (using 12GB of data, and PySpark).

NLP Nanodegree

1 year after graduating NYU, as a Linguistics and Mathematics double major, I was working as a Web developer. Don’t worry, I also received a minor in Computer Science. I continued to think about my academic interests in school, and how to bring them to life. This Nanodegree program was the right fit, to get me to start thinking about how to make sure the ideas are also applied.

Here are some cool projects that started to sharpen my skills in extracting meaning from text data:

HMM Model

When I think of the Hidden Markov Model, I’m immediately reminded of the concrete examples that I learned in school. Considering a farm and a city, where a certain number of people live in the city vs. the farm, and a certain number of people move from the city to the farm, or vice versa. With such numbers, we were eventually able to calculate an equilibrium, or tendency for both the farm population and the city population. Such an example gave me a way to visualize transition probabilities between entities, like the percentage of people moving from one place to another.

This concept was then expanded to Part of Speech tagging. The model would read in a word, and the following and preceding words; then, predict the part of speech of the current word, given the transition probability of its neighbors. Think of the word “present”. Depending on how you read this, you may be thinking of the noun version (Put this present under the Christmas tree), or a verb (Can you please stand in front of the class, and present). In Linguistics, we can talk about the underlying syntax tree being completely different, or through Mathematics, we can talk about the likelihood of the part of speech, given the string of words.

Check out my model, and a bit more discussion around the relatedness of HMM, and a bigram approach for predicting P.O.S.

Machine Translation

It’s funny to think that the only time I ran into neural networks in college was during my Psychology classes on Cognition. Except there, they called it a Connectionist Model.

Although these classes did not explain the math going on in models like this, the Udacity course embraced it, even going as far as working out 1 iteration of gradient descent, in updating the nodes in the hidden layer.

The course continued further, fleshing out the details of activation functions, adding a bias vector, and later discussing recurrent networks. The introduction of memory was another step that I think my Psychology professors would have enjoyed, with techniques such as Long Short Term Memory (LSTM), and Gated Recurrent Units (GRUs), where the notion of memory is simply another node, with its own activation layers, and gates (input, output, and forget gates). And what would these Psychologists think about Attention models, which use a softmax activation layer to determine the importance of each input at various steps. I find myself wondering if the overlap is a coincidence, or if people in Mathematics just enjoy naming their techniques after realistic phenomenon (A little of column A; column B?).

More importantly, the project on Machine Translation continues to build on these ideas, with the later implementation of an encoder-decoder model (or sequence-to-sequence), which I used to achieve a 96% accuracy in translation on a dataset of English sentences, and their French translation (which you can see here).

Takeaways

1. Two Nanodegrees is better than one

The Udacity NLP and Data Science Nanodegree complement each other well, both having entire sections dedicated to recommendation systems.

2. Get hands on experience

If you don’t put the things you learn to practice, then the knowledge will sit and get stale. Get a python library, and download a real dataset. The data exploration process is all about curiosity, and learning along the way.

3. Linguistics, Mathematics, Computer Science, …

The list goes on! A lot of NLP, and Data Science is still in the growing stage as an area of study. So, it’s natural to pull on knowledge from other areas, until a concrete skillset emerges. For now, this is part of the excitement of these fields; you never know what will inspire your next model’s architecture.

Upcoming Article:

Next, I will be writing about the remaining 2 projects in the Udacity Data Science Nanodegree, (a Data Pipeline with predictions, and a recommendation system!), and then do a recap of the skills gained from such a degree. Such articles help me to reflect, and continue to look towards the future. Thanks for reading!