7 Amazing Open Source NLP Tools to Try With Notebooks in 2019
As previously highlighted in my Beyond Word Embeddings Series, 2019 is going to be an exciting year for natural language processing. Here are my favorite NLP toolkits, you can start experimenting with them and Azure Notebooks.
The Azure Notebook Service offers free interactive computing and project management in the browser it can be linked to remote GPU DSVM compute using an Azure Subscription. I’ve included an open source notebook that contains installation instructions and a hello world example for each of these toolkits.
7 Amazing Open Source NLP Tools to Try With Notebooks in 2019 - aribornstein/NLPToolkits2019Notebookgithub.com
Tagline: NLTK — the Natural Language Toolkit — is a suite of open source Python modules, data sets, and tutorials supporting research and development in Natural Language Processing.
Favorite Features: Lexical Corpus Integration(WordNet, Stopwords, etc), Tokenization, Sentiment Analysis
NLTK Source. Contribute to nltk/nltk development by creating an account on GitHub.github.com
Tagline: spaCy is a library for advanced Natural Language Processing in Python and Cython. spaCy comes with pre-trained statistical models and word vectors, and currently supports tokenization for 30+ languages.
Favorite Features: Syntactic Parser, Named Entity Recognition, Tokenization, Speed, Extensible Pipeline Interface, Displacy visualization
💫 Industrial-strength Natural Language Processing (NLP) with Python and Cython - explosion/spaCygithub.com
Tagline: An Apache 2.0 NLP research library, built on PyTorch, for developing state-of-the-art deep learning models on a wide variety of linguistic tasks.
Favorite Features: Question and Answering, Semantic Role Labeling, Within Document Co-reference, Textual Entailment, Text to SQL
4. Stanford NLP
Tagline: The Stanford NLP Group’s official Python NLP library. It contains packages for running our latest fully neural pipeline from the CoNLL 2018 Shared Task and for accessing the Java Stanford CoreNLP server.
Official Stanford NLP Python Library for Many Human Languages - stanfordnlp/stanfordnlpgithub.com
5. Intel NLP Architect
Tagline: NLP Architect is an open-source Python library for exploring state-of-the-art deep learning topologies and techniques for natural language processing and natural language understanding.
Favorite Features: Intent Extraction, Term Set Expansion, Machine Reading Comprehension, The only working python based Cross Document Co-Reference Sieve Based System.
NLP Architect by Intel AI Lab: A Python library for exploring the state-of-the-art deep learning topologies and…github.com
Tagline: Flair allows you to apply our state-of-the-art natural language processing (NLP) models to your text, such as named entity recognition (NER), part-of-speech tagging (PoS), sense disambiguation and classification.
Favorite Features: Easy to use Pretrained BERT and Flair Embeddings
A very simple framework for state-of-the-art Natural Language Processing (NLP) - zalandoresearch/flairgithub.com
Tagline: Gensim is a Python library for topic modelling, document indexing and similarity retrieval with large corpora. Target audience is the natural language processing (NLP) and information retrieval (IR) community.
Favorite Features: Topic Modeling, Favorite LDA implementation
Topic Modelling for Humans. Contribute to RaRe-Technologies/gensim development by creating an account on GitHub.github.com
There you go, this should be more than enough to get you started on your next big NLP project.
Hope this helps you get started on your NLP journey feel free to comment below with your ideas.
If the field of NLP interests you and you would like to learn more about how these frameworks work behind the scenes, check out my Beyond Word Embeddings Series below.
This series will review the pros and cons of word embeddings and demonstrate how to incorporate more complex semantic…towardsdatascience.com
If you have any questions, comments, or topics you would like me to discuss feel free to follow me on Twitter if there is a tool you feel I missed, please let me know in the comments below.
About the Author
Aaron (Ari) Bornstein is an avid AI enthusiast with a passion for history, engaging with new technologies and computational medicine. As an Open Source Engineer at Microsoft’s Cloud Developer Advocacy team, he collaborates with Israeli Hi-Tech Community, to solve real world problems with game changing technologies that are then documented, open sourced, and shared with the rest of the world.