NLP with Hugging Face Transformers

Kabirnagpal

Published in

Nerd For Tech

4 min readJul 27, 2020

If you’ve ever gone through any Natural Language Processing project or tutorial, you must have learned these things:

Classifier
Encoder-Decoder
LSTM
GloVe or Word2Vec
Embedding Layer

But is that it? Though most of us have processed text using RNNs, we’ve rarely noticed their inability to train faster on GPUs. This is due to the architecture of most NLP models of processing texts sequentially. Can we improvise the model?

Past few years have seen a rise in NLP models. From the release of Attention mechanism in 2015, Transformer in 2017, with the upcoming release of GPT-3, I believe computers will be able to process human speech far better than humans, themselves.

From Supervised Sequence learning in LSTMs to Semi-supervised Sequence Learning in Bert with 340 million parameters, the demand and application of NLP have skyrocketed. Just for a fact, GPT-3 has 175 billion parameters! And all I learned as a starter was LSTMs.

To stay at power with this boom in research, you need to stay updated.

Hugging Face’s Transformer have simplified the applications of these enormous models. And this unique ability doesn’t come at the cost of learning any new Deep Learning Library. It’s well integrated with both TensorFlow and PyTorch.

Let’s First look into some direct application of this Package. All the Codes are also officially available here.

1. Sentiment Analysis

Sentiment analysis is the interpretation and classification of emotions (positive, negative and neutral). For this task, most people create a bag of words and pass it to Logistic Regression (optional dimensionality reduction). But why train, when we have the functionality, which allows us to apply it directly.

from transformers import *nlp = pipeline('sentiment-analysis')print(nlp('I love my country.'))

2. Named Entity Recognition

NER method is also popularly used from packages like NLTK. It helps us to classify words into different categories like Organisation Name, Person name etc

nlp_token_class = pipeline('ner')nlp_token_class('FAANG  stands for Facebook, Apple, Amazon, Netflix and Google.')

3. Text Summarisation

Text summarization is of two types: Extractive and Abstractive.

Extractive summarisation selects the sentence from the text with the most valuable context whereas Abstractive is trained explicitly to create a summary from the text.

TEXT = """100+ Days of ML Code is a commitment to better your understanding of this powerful tool by dedicating at least 1 hour of your time every day to studying and/or coding machine learning for at-least 100 days.Everyone, beginners as well as professionals are welcome to take up the challenge, join, contribute and collaborate.At the end of this 100 Days Of ML journey, all the members will be able to showcase a rich portfolio of code, analysis and narrative, treating all the above topics and models plus all the additional content, that as a member you will invariably experience and explore yourself, throughout this learning journey."""summarizer = pipeline('summarization')print(summarizer(TEXT))

4. Question Answering

qa = pipeline('question-answering')print(qa(context=TEXT, question='How many days is the challenge for? '))print(qa(context=TEXT, question='What will I get at the end?'))

5. Translation

Translation if you know, is one of the most popular examples of sequence to sequence learning. But with Hugging face you don’t need to train your own. ( for most of the popular languages. With transfer learning, you can surely make your customer model as well. )

translator = pipeline('translation_en_to_fr')translator("100 Days of ML Code is a commitment to better your understanding of this powerful tool by dedicating at least 1 hour of your time every day to studying and/or coding machine learning for at-least 100 days.")

These were just some major applications but there are more like text generation, finding the missing word etc. But what about the models like Bert and GPT we discussed earlier?

Well, Hugging Face provides us with the flexibility to use those pre-trained models as well alongside PyTorch and Tensorflow.

A few keywords to keep in mind if you’re new to this domain:

Tokenizing: Any ML/Neural model is just a bunch of formulas and cannot recognise strings as input. So we need to convert each word/character to some unique number.
Padding and Truncating: Models need to have some fixed size input or batch input size, hence, small sentences must be concatenated with some unique character and large sentences must be chipped off at some length.
Model: Just a reference to different models we’ll try out.
Decode: If a model understands only numerics it will also generate out numerics. This function helps us to decode the numerics.

Now let’s see the same application but with superior models. List of all integrated models is available here.

6. Masked Language Modelling

This model helps in filling certain blank spaces in the sentence. Please note that you’ll have to add a special masked character as below to the sentence before prediction.

7. Text Generation

Text generation is a great example where we learned LSTMs. But a shortcoming of this ( along with many many other models ) is that after a few predictions, the prediction starts repeating itself. So to stop this we need to add something special. Special tokens like <sep>, <end> etc are commonly used for this purpose.

Several more models can be viewed in the official docs and are available in both Pipeline and Pretrained modules.

As you now are familiar with the application and flexibility of these transformers, I hope you’ll be comfortable with NLP soon.

Kudos on Learning something new. Do leave some claps and comments, if you’d like.