7 Amazing Open Source NLP Tools to Try With Notebooks in 2019

Published in

Microsoft Azure

4 min readFeb 14, 2019

As previously highlighted in my Beyond Word Embeddings Series, 2019 is going to be an exciting year for natural language processing. Here are my favorite NLP toolkits, you can start experimenting with them and Azure Notebooks.

The Azure Notebook Service offers free interactive computing and project management in the browser it can be linked to remote GPU DSVM compute using an Azure Subscription. I’ve included an open source notebook that contains installation instructions and a hello world example for each of these toolkits.

aribornstein/NLPToolkits2019Notebook

7 Amazing Open Source NLP Tools to Try With Notebooks in 2019 - aribornstein/NLPToolkits2019Notebook

github.com

1. NLTK

Tagline: NLTK — the Natural Language Toolkit — is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing.

Favorite Features: Lexical Corpus Integration(WordNet, Stopwords, etc), Tokenization, Sentiment Analysis

nltk/nltk

NLTK Source. Contribute to nltk/nltk development by creating an account on GitHub.

github.com

2. spaCy

Tagline: spaCy is a library for advanced Natural Language Processing in Python and Cython. spaCy comes with pre-trained statistical models and word vectors, and currently supports tokenization for 30+ languages.

Favorite Features: Syntactic Parser, Named Entity Recognition, Tokenization, Speed, Extensible Pipeline Interface, Displacy visualization

explosion/spaCy

💫 Industrial-strength Natural Language Processing (NLP) with Python and Cython - explosion/spaCy

github.com

3. AllenNLP

Tagline: An Apache 2.0 NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks.

Favorite Features: Question and Answering, Semantic Role Labeling, Within Document Co-reference, Textual Entailment, Text to SQL

allenai/allennlp

An open-source NLP research library, built on PyTorch. - allenai/allennlp

github.com

4. Stanford NLP

Tagline: The Stanford NLP Group’s official Python NLP library. It contains packages for running our latest fully neural pipeline from the CoNLL 2018 Shared Task and for accessing the Java Stanford CoreNLP server.

Favorite Features: Extensive Language Support for Tokenization, Parsing, Named Entity Extraction including Hebrew, Arabic, Finnish, Basque and more.

stanfordnlp/stanfordnlp

Official Stanford NLP Python Library for Many Human Languages - stanfordnlp/stanfordnlp

github.com

5. Intel NLP Architect

Tagline: NLP Architect is an open-source Python library for exploring state-of-the-art deep learning topologies and techniques for natural language processing and natural language understanding.

Favorite Features: Intent Extraction, Term Set Expansion, Machine Reading Comprehension, The only working python based Cross Document Co-Reference Sieve Based System.

NervanaSystems/nlp-architect

NLP Architect by Intel AI Lab: A Python library for exploring the state-of-the-art deep learning topologies and…

github.com

6. Flair

Tagline: Flair allows you to apply our state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), part-of-speech tagging (PoS), sense disambiguation and classification.

Favorite Features: Easy to use Pretrained BERT and Flair Embeddings

zalandoresearch/flair

A very simple framework for state-of-the-art Natural Language Processing (NLP) - zalandoresearch/flair

github.com

7. Gensim

Tagline: Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.

Favorite Features: Topic Modeling, Favorite LDA implementation

RaRe-Technologies/gensim

Topic Modelling for Humans. Contribute to RaRe-Technologies/gensim development by creating an account on GitHub.

github.com

There you go, this should be more than enough to get you started on your next big NLP project.

Hope this helps you get started on your NLP journey feel free to comment below with your ideas.

Next Steps

If the field of NLP interests you and you would like to learn more about how these frameworks work behind the scenes, check out my Beyond Word Embeddings Series below.

Beyond Word Embeddings Part 1 — An Overview of Neural NLP Milestones

This series will review the pros and cons of word embeddings and demonstrate how to incorporate more complex semantic…

towardsdatascience.com

If you have any questions, comments, or topics you would like me to discuss feel free to follow me on Twitter if there is a tool you feel I missed, please let me know in the comments below.

About the Author

Aaron (Ari) Bornstein is an avid AI enthusiast with a passion for history, engaging with new technologies and computational medicine. As an Open Source Engineer at Microsoft’s Cloud Developer Advocacy team, he collaborates with Israeli Hi-Tech Community, to solve real world problems with game changing technologies that are then documented, open sourced, and shared with the rest of the world.

7 Amazing Open Source NLP Tools to Try With Notebooks in 2019

aribornstein/NLPToolkits2019Notebook

7 Amazing Open Source NLP Tools to Try With Notebooks in 2019 - aribornstein/NLPToolkits2019Notebook

1. NLTK

nltk/nltk

NLTK Source. Contribute to nltk/nltk development by creating an account on GitHub.

2. spaCy

explosion/spaCy

💫 Industrial-strength Natural Language Processing (NLP) with Python and Cython - explosion/spaCy

3. AllenNLP

allenai/allennlp

An open-source NLP research library, built on PyTorch. - allenai/allennlp

4. Stanford NLP

stanfordnlp/stanfordnlp

Official Stanford NLP Python Library for Many Human Languages - stanfordnlp/stanfordnlp

5. Intel NLP Architect

NervanaSystems/nlp-architect

NLP Architect by Intel AI Lab: A Python library for exploring the state-of-the-art deep learning topologies and…

6. Flair

zalandoresearch/flair

A very simple framework for state-of-the-art Natural Language Processing (NLP) - zalandoresearch/flair

7. Gensim

RaRe-Technologies/gensim

Topic Modelling for Humans. Contribute to RaRe-Technologies/gensim development by creating an account on GitHub.

Next Steps

Beyond Word Embeddings Part 1 — An Overview of Neural NLP Milestones

This series will review the pros and cons of word embeddings and demonstrate how to incorporate more complex semantic…

About the Author

Written by Aaron (Ari) Bornstein