1 Line of Code, 1000 + NLP in 200+ languages with John Snow Labs’ NLU in Python
100+ Embeddings, 50+ Classifiers, 200+ Languages, Infinite Power — The richest NLP tool on the Market!
John Snow Labs’ NLU is a Python library for applying state-of-the-art (SOTA) text mining, directly on any data frame, with a single line of code.
As a facade of the award-winning Spark NLP library, it comes with hundreds of pre-trained models in tens of languages — all production-grade, scalable, and trainable.
A picture says more than a 1000 words, so here is a GIF! Lean back and enjoy
NLP stands for natural language processing, it is the task of preprocessing text and extracting useful features like emotions, grammatical labels, named entities, generating lower dimensional embeddings, and much more.
NLU stands for natural language understanding, it helps data scientists understand text data written in human languages by minimizing the NLP part to 1 line of code at most.
Instead of having to focus on data processing and feature engineering, with NLU a data scientist can focus on UNDERSTANDING the natural language data.
What does NLU 0.1 include?
NLU provides everything a data scientist might want to wish for in one line of code!
- 350 + pre-trained models
- 100+ of the latest NLP word embeddings ( BERT, ELMO, ALBERT, XLNET, GLOVE, BIOBERT, ELECTRA, COVIDBERT) and different variations of them
- 50+ of the latest NLP sentence embeddings ( BERT, ELECTRA, USE) and different variations of them
- 50+ Classifiers
- Labeled and Unlabeled Dependency parsing
- Spell Checking
- Various text-preprocessing and cleaning methods
Choose the right tool for the right task!
Whether you analyze movies or twitter, NLU has the right model for you!
What classifiers does NLU 0.1 include?
This is just a brief overview of classifiers that NLU has to offer.
- NER pre-trained on CONLL (18 class)
- Part of Speech
- 50 Class Questions Classifier
- Spam Classifier
- Fake News Classifier
- Emotion Classifier
- Cyberbullying Classifier
- Sarcasm Classifier
- Toxic Classifier
- E2E Classifier
- Sentiment Classifier pre-trained on IMDB movie reviews
- Sentiment Classifier pre-trained on Twitter
- Language Classifier for 20 languages
In addition to that, NLU defines a wide range of so-called NLU Components that embellish one of many NLP algorithms, all of course in just 1 line.
How does it work?
Easy as pie! You just call nlu.load(model) and pass a string reference to the models you want, some examples :
Let's get 5 of the latest embeddings in deep learning!
nlu.load('bert albert elmo electra xlnet').predict(youData)
One line Named Entity Recognition (NER)
nlu.load('ner').predict('That was easy')
One line Part of Speech(POS)
nlu.load('pos').predict('The fastest way for SOTA POS results')
Want to classify binary sentiment?
nlu.load('sentiment').predict('I love nlu!')
Specialzie sentiment for twitter?
nlu.load('sentiment.twitter').predict('@CKL-IT NLU rocks #nlp !')
Or maybe for movies?
nlu.load(‘sentiment.imdb’).predict('The Matrix was pretty cool')
That's all you need to know to achieve State of the Art NLP Results!
Your data could be :
- Pandas dataframe
- Modin dataframe
- Spark dataframe
- Python string
- List of Python Strings
- Numpy Array of Strings
With so many models at hand, you are only limited by your imagination and ram. In case your RAM hits its limits, you can easily scale with Spark NLP since every model in NLU is provided by Spark NLP. This means you can easily take your NLU pipeline and scale it to hundreds of nodes in a Spark Cluster very easily and fast!
With one line of NLU and a few lines for plotting you can make awesome plots like the following. Check out our other Medium article for a tutorial on how to generate these kinds of plots for any text dataset very easily.
More NLU Medium articles
- One line BERT Word Embeddings and t-SNE plotting with NLU
- BERT, ALBERT, ELECTRA, ELMO, XLNET, GLOVE Word Embeddings in one line and plotting with t-SNE
More about NLU
- NLU website
- NLU Github
- NLU Documentation
- Having questions or wanna share an idea? Join us on Slack!
- Overview of all NLU example notebooks
- Named Entity Recognition (NER) 18 class notebook
- Part of Speech (POS) notebook
- BERT Word Embeddings and T-SNE plotting notebook
- ALBERT Word Embeddings and T-SNE plotting notebook
- ELMO Word Embeddings and T-SNE plotting notebook
- XLNET Word Embeddings and T-SNE plotting notebook
- Typed Dependency Parsing notebook