My Recommendations for Getting Started with NLP

Resources for getting started with natural language processing.

elvis
DAIR.AI
7 min readOct 29, 2020

--

I have been studying natural language processing (NLP) since 2013, back when manual feature engineering was very popular in the world of machine learning. We have come a long way since then. I actually specialized in information retrieval and machine learning techniques for my Ph.D., particularly how they apply to social computing and computational linguistics, while at the same time developing approaches for efficient information extraction from large scale text-based data. I am fortunate to have experience with classical machine learning applied to NLP and witnessed firsthand the explosion of deep learning in the field.

Lots of students have been asking me to prepare a guide for how to get started with natural language processing. This blog post is a shot at helping out others based on research, exposure to the field, and personal experience. Although it is not a direct guide, the resources I share here can help you create your own NLP learning path based on your needs. This will be a combination of educational resources that I have come across over the years. I will share my experience in studying these resources and where they are applicable.

The list is not exhaustive by any means but it should provide options that serve as a great starting point for anyone interested in gettering started with NLP. You don’t really need to consume all the content. Just choose the resources that fit your current needs. For instance, maybe you already have some theoretical foundation, and you only need to get the best practices for developing NLP systems in production. In that case, you can jump straight to the recommendations for getting hands-on experience with NLP techniques. I am only covering content that I have studied personally, and I am sure there are other wonderful resources out there that I have missed, feel free to comment if you have any recommendations.

📘 Speech and Language Processing

by Dan Jurafsky and James H. Martin

Studying the fundamentals is vital in learning about any subject you are studying. I am a huge advocate for that as this has worked for me. I have been following this book for a while and it’s now in its third edition. The material covered in this book is exceptionally well-written and offers a great theoretical foundation to NLP. This could potentially be a great starting point for anyone wanting to get started with NLP. Even though I have read the book, I regularly check it as it is regularly updated with the latest developments in the field. If you really like this book, you will also find these lectures useful as they do cover a lot of the fundamental topics covered in the book.

📘 Linguistic Fundamentals for Natural Language Processing: 100 Essentials from Morphology and Syntax

by Emily M. Bender

Emily Bender is one of my favorite linguistics researchers. Her work has influenced my own research tremendously and has allowed me to adopt a more rigorous approach to NLP research. NLP is heavily influenced by linguistics and, in fact, Emily advocates for using teachings in linguistics to inform developments in NLP. Her book provides an exceptional introduction to concepts in linguistics used in NLP. A must-read book for any NLP student.

📘 Linguistic Structure Prediction

by Noah A. Smith

This book focuses on bridging natural language processing and machine learning, covering statistical, computational approaches to modeling linguistic structure. The book assumes that you have some exposure to machine learning already. You can check out the list of machine learning recommendations I made here if you are not too familiar with the topic. It is advised you at least do an intro to machine learning course to make the most out of this book.

📘 Introduction to Natural Language Processing

by Jacob Eisenstein

This is one of my favorite NLP books due to the focus on discussing linguistic concepts and applications. It covers methods like beam search, maximum likelihood estimation, matrix factorization, among others. It then explains how the methods are used to address a wide range of tasks like classification, part-of-speech tagging, relation extraction, language modeling, etc. The book assumes knowledge of subjects like multivariate calculus and linear algebra. One recommendation directly from the book is the Mathematics for Machine Learning book. It is a more advanced textbook compared to others and it does require some understanding of machine learning and mathematical concepts.

📘 Neural Network Methods in Natural Language Processing (Synthesis Lectures on Human Language Technologies)

by Yoav Goldberg

If you are just starting your journey into NLP, you have probably been exposed to more modern methods for NLP like RNNs and other deep learning-based models. If you are looking for a comprehensive theoretical overview of neural networks and how they are used in NLP, this is the book for you. The references found in this book have been instrumental in my own research.

🌐 Modern Deep Learning Techniques Applied to Natural Language Processing

by Soujanya Poria and Elvis Saravia

On the topic of modern methods for NLP, I would also recommend this open resource I put together with Soujanya Poria. It walks you through some of the more recent developments in the field of NLP ranging from word embedding to attention mechanism to reinforcement learning.

📺 CS224N: Natural Language Processing with Deep Learning | Winter 2019

by Christopher Manning and Abigail See

If you recently got started with NLP, you have probably come across this popular NLP course. All the lectures and slides are public and you can find it on the course website. This course is heavily focused on deep learning methods for NLP so you will see that the first lecture starts directly with word vectors and then transitions into more advanced topics like convolutional networks and Transformers. If you are interested in classical NLP methods you may have to check one of the books mentioned at the beginning. In fact, I would strongly recommend you to do so as it’s valuable knowledge useful in practice for building real-world NLP systems.

The theory is great but regardless if you are an NLP researcher or engineer, you still have to complement it with hands-on practice. These are some books I have found to be exceptionally useful to get practice on topics like language modeling and text-based classification.

📘 Natural Language Processing with PyTorch: Build Intelligent Language Applications Using Deep Learning

by Delip Rao and Brian McMahan

Even though the book is based on PyTorch, it’s great to get hands-on practice with building language applications with deep learning. There is also content and code for the traditional concepts and methods like TF-IDF and semantics, to name a few. If you are a PyTorch developer, you will find this book easy to follow.

📘 Natural Language Processing in Action

by Hobson Lane, Cole Howard, and Hannes Hapke

This is another exceptional book, and of my favorites to get hands-on practice for all things NLP. This book guides you on how to build your first vocabulary from a corpus all the way up to building a chatbot. There is a lot of code examples in this book so if you are into coding, it could be a good fit for you.

📘 Practical Natural Language Processing

by Sowmya Vajjala, Bodhisattwa Majumder, Anuj Gupta, and Harshit Surana

In terms of hands-on practice for NLP, I am thoroughly enjoying this book published this year. It covers topics that range from all sorts of practical applications in NLP to best practices for deploying NLP systems. Even though I am just half-way through this book, I had to include it as there are many NLP engineers out there that want to get familiar with how to build NLP systems more effectively and understanding the techniques needed to do so.

⭐️ Bonus

Here are a few other resources and projects that could help you to stay informed with the field of NLP:

That’s it for my recommendations on how to get started with NLP. It’s important that you choose the content that best fits your need. I have tried to offer some explanation for each item and hope that helps you to create your own learning path. These are some of the best resources I have come across and I have found them very useful to expand my knowledge and even teach these concepts, not to mention applying them to research ideas and building NLP systems that range from semantic search engines to emotion classifiers.

--

--