Hugging Face: State-of-the-Art Natural Language Processing in ten lines of TensorFlow 2.0

Lysandre Debut
Oct 30, 2019 · 5 min read

Hugging Face is the leading NLP startup with more than a thousand companies using their library in production including Bing, Apple, Monzo. All examples used in this tutorial are available on Colab. The links are available in the corresponding sections.

Image for post
Image for post


Hugging Face is an NLP-focused startup with a large open-source community, in particular around the Transformers library. 🤗/Transformers is a python-based library that exposes an API to use many well-known transformer architectures, such as BERT, RoBERTa, GPT-2 or DistilBERT, that obtain state-of-the-art results on a variety of NLP tasks like text classification, information extraction, question answering, and text generation. Those architectures come pre-trained with several sets of weights. Getting started with Transformers only requires to install the pip package:

pip install transformers

The library has seen super-fast growth in PyTorch and has recently been ported to TensorFlow 2.0, offering an API that now works with Keras’ fit API, TensorFlow Extended, and TPUs 👏. This blog post is dedicated to the use of the Transformers library using TensorFlow: using the Keras API as well as the TensorFlow TPUStrategy to fine-tune a State-of-The-Art Transformer model.

Library & Philosophy

Transformers is based around the concept of pre-trained transformer models. These transformer models come in different shapes, sizes, and architectures and have their own ways of accepting input data: via tokenization.

The library builds on three main classes: a configuration class, a tokenizer class, and a model class.

bert-base-cased Configuration file as a JSON

Joy in simplicity

The advantage of using Transformers lies in the straight-forward model-agnostic API. Loading a pre-trained model, along with its tokenizer can be done in a few lines of code. Here is an example of loading the BERT and GPT-2 TensorFlow models as well as their tokenizers:

Loading architectures is model-agnostic

The weights are downloaded from HuggingFace’s S3 bucket and cached locally on your machine. The models are ready to be used for inference or finetuned if need be. Let’s see that in action.

Fine-tuning a Transformer model

Fine-tuning a model is made easy thanks to some methods available in the Transformer library. The next parts are built as such:

Building an input pipeline

We have made an accompanying colab notebook to get you fast on track with all the code. We’ll leverage thetensorflow_datasets package for data loading. Tensorflow-dataset provides us with a , which can be fed into our glue_convert_examples_to_features method.

This method will make use of the tokenizer to tokenize the input and add special tokens at the beginning and the end of sequences (like [SEP], [CLS], </s> or <s> for instance) if such additional tokens are required by the model. This method returns a holding the featurized inputs.

We can then shuffle this dataset and batch it in batches of 32 units using standard methods.

Building an input pipeline for our model

Training with Keras’ fit method

Training a model using Keras’ fit method has never been simpler. Now that we have the input pipeline setup, we can define the hyperparameters, and call the Keras’ fit method with our dataset.

Training with Keras’ fit method

Training with Strategy

Training with a strategy gives you better control over what happens during the training. By switching between strategies, the user can select the distributed fashion in which the model is trained: from multi-GPUs to TPUs.

As of the time of writing, TPUStrategy is the only surefire way to train a model on a TPU using TensorFlow 2. Building a custom loop using a strategy makes even more sense in that regard, as strategies may easily be switched around and training on multi-GPU would require practically no code change.

Building a custom loop requires a bit of work to set-up, therefore the reader is advised to open the following colab notebook to have a better grasp of the subject at hand. It does not go into the detail of tokenization as the first colab has done, but it shows how to build an input pipeline that will be used by the TPUStrategy.

This makes use of Google Cloud Platform bucket as a means to host data, as TPUs are complicated to handle when using local filesystems. The colab notebook is available here.

Transformers now has access to TensorFlow APIs - So what?

The main selling point of the Transformers library is its model agnostic and simple API. Acting as a front-end to models that obtain state-of-the-art results in NLP, switching between models according to the task at hand is extremely easy.

As an example, here’s the complete script to fine-tune BERT on a language classification task(MRPC):

Fine-tuning BERT on MNLI

However, in a production environment, memory is scarce. You would like to use a smaller model instead; switching to DistilBERT for example. Simply change the first two lines to these two in order to do so:

Loading DistilBERT with its tokenizer

As a platform hosting 10+ Transformer architectures, 🤗/Transformers makes it very easy to use, fine-tune and compare the models that have transfigured the deep-learning for NLP field. It serves as a backend for many downstream apps that leverage transformer models and is in use in production by many different companies. We’ll welcome any question or issue you might have on our GitHub repository.


TensorFlow is an end-to-end open source platform for…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store