1 Line of Code, 1000 + NLP in 200+ languages with John Snow Labs’ NLU in Python

Christian Kasim Loan

Follow

Published in

spark-nlp

4 min readSep 21, 2020

--

100+ Embeddings, 50+ Classifiers, 200+ Languages, Infinite Power — The richest NLP tool on the Market!

John Snow Labs’ NLU is a Python library for applying state-of-the-art (SOTA) text mining, directly on any data frame, with a single line of code.

As a facade of the award-winning Spark NLP library, it comes with hundreds of pre-trained models in tens of languages — all production-grade, scalable, and trainable.

A picture says more than a 1000 words, so here is a GIF! Lean back and enjoy

NLP stands for natural language processing, it is the task of preprocessing text and extracting useful features like emotions, grammatical labels, named entities, generating lower dimensional embeddings, and much more.

NLU stands for natural language understanding, it helps data scientists understand text data written in human languages by minimizing the NLP part to 1 line of code at most.

Instead of having to focus on data processing and feature engineering, with NLU a data scientist can focus on UNDERSTANDING the natural language data.

What does NLU 0.1 include?

NLU provides everything a data scientist might want to wish for in one line of code!

350 + pre-trained models
100+ of the latest NLP word embeddings ( BERT, ELMO, ALBERT, XLNET, GLOVE, BIOBERT, ELECTRA, COVIDBERT) and different variations of them
50+ of the latest NLP sentence embeddings ( BERT, ELECTRA, USE) and different variations of them
50+ Classifiers
Labeled and Unlabeled Dependency parsing
Spell Checking
Various text-preprocessing and cleaning methods

Choose the right tool for the right task!
Whether you analyze movies or twitter, NLU has the right model for you!

What classifiers does NLU 0.1 include?

This is just a brief overview of classifiers that NLU has to offer.

NER pre-trained on CONLL (18 class)
Part of Speech
50 Class Questions Classifier
Spam Classifier
Fake News Classifier
Emotion Classifier
Cyberbullying Classifier
Sarcasm Classifier
Toxic Classifier
E2E Classifier
Sentiment Classifier pre-trained on IMDB movie reviews
Sentiment Classifier pre-trained on Twitter
Language Classifier for 20 languages

In addition to that, NLU defines a wide range of so-called NLU Components that embellish one of many NLP algorithms, all of course in just 1 line.

How does it work?

Easy as pie! You just call nlu.load(model) and pass a string reference to the models you want, some examples :

Let's get 5 of the latest embeddings in deep learning!

nlu.load('bert albert elmo electra xlnet').predict(youData)

One line Named Entity Recognition (NER)

nlu.load('ner').predict('That was easy')

One line Part of Speech(POS)

nlu.load('pos').predict('The fastest way for SOTA POS results')

Want to classify binary sentiment?

nlu.load('sentiment').predict('I love nlu!')

Specialzie sentiment for twitter?

nlu.load('sentiment.twitter').predict('@CKL-IT NLU rocks #nlp !')

Or maybe for movies?

nlu.load(‘sentiment.imdb’).predict('The Matrix was pretty cool')

That's all you need to know to achieve State of the Art NLP Results!

Your data could be :

Pandas dataframe
Modin dataframe
Spark dataframe
Python string
List of Python Strings
Numpy Array of Strings

With so many models at hand, you are only limited by your imagination and ram. In case your RAM hits its limits, you can easily scale with Spark NLP since every model in NLU is provided by Spark NLP. This means you can easily take your NLU pipeline and scale it to hundreds of nodes in a Spark Cluster very easily and fast!

With one line of NLU and a few lines for plotting you can make awesome plots like the following. Check out our other Medium article for a tutorial on how to generate these kinds of plots for any text dataset very easily.

1 Line of Code, 1000 + NLP in 200+ languages with John Snow Labs’ NLU in Python

What does NLU 0.1 include?

What classifiers does NLU 0.1 include?

How does it work?

That's all you need to know to achieve State of the Art NLP Results!

Your data could be :

More NLU Medium articles

More about NLU

Written by Christian Kasim Loan