NLP Resources

My cohort from General Assembly has now graduated. Woop Woop! As such I’ve had a lot of time to start doing self-directed learning. This has been very enjoyable — but has led me to feel like I’m now just learning to learn instead of build. So as I follow through these resources, I’ve been brainstorming ideas of how to apply the concepts to a side project in addition to doing the related coursework.

I’ve generally been more curious in building NLP skills since the course ended and found 2 great resources for it. Both from Stanford. The first covers the true foundations of NLP in a ‘pre-AI-hyped’ world. AKA 2012. Saved from an archived course on coursera — it covers fundamentals of spell checkers, minimum edit distance, language modeling, text classification, sentiment analysis, entity extraction, and more (I’ve only reviewed the first 6 chapters so far). If you want to get started in NLP, watching these seems like the first place to start.

Dan Jurafsky & Chris Manning are excellent teachers

The second is also offered by Stanford and is their current course focusing on modern NN methods in NLP. It may be a bit more mathy than the previous video set but its a great introduction to word embeddings, global vectors, dependency parsing and the fundamentals of neural networks and associated architectures used for NLP. If you’re looking to learn tensorflow and want to focus on NLP this is a perfect combination. In addition, the course syllabus is packed full of relevant research papers, and learning materials explaining concepts in more detail.

Associated material can be found here: http://cs224d.stanford.edu/syllabus.html

Some of the ideas I’ve had, that admittedly are far beyond my skill level at this point:

  • Make a twitter bot called the ‘Comeback King’ that can generate relevant comebacks to putdowns tweeted at it. A GANs in concert with seq2seq?
  • Use a RDF2vec model to learn/generate riddles. My naive idea would be that that the described entity would exhibit similar node or edge embeddings.
  • Trumpify text — this actually has been done by a number of people but none of them satisfy my idea of an output. The question is this a seq2seq type translation? Or can word2vec be updated on a Trump corpus to then replace semantically similar words? And how do you build in a sentence similarity metric, which the other models lack?
  • Entity extraction from realtime twitter feed. Simply fro practice.

None of these are useful in a real sense, but what I’ve found is that I need a project and goals to reach. Otherwise, I can spend a whole day learning and studiously note-taking only to end up feeling unproductive.

Within the week I aim to know what I steps these projects involve and start chipping away at them as I continue the above courses.