1 Line of Code, 1000 + NLP in 200+ languages with John Snow Labs’ NLU in Python

Christian Kasim Loan
spark-nlp
Published in
4 min readSep 21, 2020

100+ Embeddings, 50+ Classifiers, 200+ Languages, Infinite Power — The richest NLP tool on the Market!

Understanding text data with NLU

John Snow LabsNLU is a Python library for applying state-of-the-art (SOTA) text mining, directly on any data frame, with a single line of code.

As a facade of the award-winning Spark NLP library, it comes with hundreds of pre-trained models in tens of languages — all production-grade, scalable, and trainable.

A picture says more than a 1000 words, so here is a GIF! Lean back and enjoy

12 of the greatest and latest features in NLP in just 1 line of code

NLP stands for natural language processing, it is the task of preprocessing text and extracting useful features like emotions, grammatical labels, named entities, generating lower dimensional embeddings, and much more.

NLU stands for natural language understanding, it helps data scientists understand text data written in human languages by minimizing the NLP part to 1 line of code at most.

Instead of having to focus on data processing and feature engineering, with NLU a data scientist can focus on UNDERSTANDING the natural language data.

What does NLU 0.1 include?

NLU provides everything a data scientist might want to wish for in one line of code!

  • 350 + pre-trained models
  • 100+ of the latest NLP word embeddings ( BERT, ELMO, ALBERT, XLNET, GLOVE, BIOBERT, ELECTRA, COVIDBERT) and different variations of them
  • 50+ of the latest NLP sentence embeddings ( BERT, ELECTRA, USE) and different variations of them
  • 50+ Classifiers
  • Labeled and Unlabeled Dependency parsing
  • Spell Checking
  • Various text-preprocessing and cleaning methods

    Choose the right tool for the right task!
    Whether you analyze movies or twitter, NLU has the right model for you!

What classifiers does NLU 0.1 include?

This is just a brief overview of classifiers that NLU has to offer.

  • NER pre-trained on CONLL (18 class)
  • Part of Speech
  • 50 Class Questions Classifier
  • Spam Classifier
  • Fake News Classifier
  • Emotion Classifier
  • Cyberbullying Classifier
  • Sarcasm Classifier
  • Toxic Classifier
  • E2E Classifier
  • Sentiment Classifier pre-trained on IMDB movie reviews
  • Sentiment Classifier pre-trained on Twitter
  • Language Classifier for 20 languages

In addition to that, NLU defines a wide range of so-called NLU Components that embellish one of many NLP algorithms, all of course in just 1 line.

NLU Component types

How does it work?

Easy as pie! You just call nlu.load(model) and pass a string reference to the models you want, some examples :

Let's get 5 of the latest embeddings in deep learning!

nlu.load('bert albert elmo electra xlnet').predict(youData)

One line Named Entity Recognition (NER)

nlu.load('ner').predict('That was easy')

One line Part of Speech(POS)

nlu.load('pos').predict('The fastest way for SOTA POS results')

Want to classify binary sentiment?

nlu.load('sentiment').predict('I love nlu!')

Specialzie sentiment for twitter?

nlu.load('sentiment.twitter').predict('@CKL-IT NLU rocks #nlp !')

Or maybe for movies?

nlu.load(‘sentiment.imdb’).predict('The Matrix was pretty cool')

That's all you need to know to achieve State of the Art NLP Results!

Your data could be :

  • Pandas dataframe
  • Modin dataframe
  • Spark dataframe
  • Python string
  • List of Python Strings
  • Numpy Array of Strings

With so many models at hand, you are only limited by your imagination and ram. In case your RAM hits its limits, you can easily scale with Spark NLP since every model in NLU is provided by Spark NLP. This means you can easily take your NLU pipeline and scale it to hundreds of nodes in a Spark Cluster very easily and fast!

With one line of NLU and a few lines for plotting you can make awesome plots like the following. Check out our other Medium article for a tutorial on how to generate these kinds of plots for any text dataset very easily.

Insightful Embedding plots with NLU and T-SNE, comparing BERT, ALBERT, ELMO, ELECTRA, XLNET and GLOVE word embeddings for Part Of Speech

--

--

Christian Kasim Loan
spark-nlp

Data Science, Big Data, Data Engineering, DevOps expert