How to get started in NLP

Dependency parse tree visualized by displaCy

Somewhere I read that if you ever have to answer the same question twice, it’s probably a good idea to turn it into a blog post. In keeping with this rule and to save my future self some time, here now my standard answer to the question: “My background is in * science, and I’m interested in learning NLP. Where do I start?”

Before you dive in, please note that the list below is really just a very general starting point (and likely incomplete). To help navigate the flood of information, I added short descriptions and difficulty estimates in brackets. Basic programming skills (e.g. in Python) are recommended.

Online courses

Libraries and open source

  • spaCy (website, blog) [Python; emerging open-source library with fantastic usage examples, API documentation, and demo applications]
  • Natural Language Toolkit (NLTK) (website, book) [Python; practical intro to programming for NLP, mainly used for teaching]
  • Stanford CoreNLP (website) [Java, high-quality analysis toolkit]

Active blogs



DIY projects and data sets


A thorough list of publicly available NLP data sets has already been created by Nicolas Iderhoff. Beyond these, here are some projects I can recommend to any NLP novice wanting to get their hands dirty:

NLP on social media

Reach out on Twitter @meltomene