By Omair Khan

Friends and users of our open-source tools are often surprised how fast 🚀 we reimplement the latest SOTA pre-trained TensorFlow models to make them accessible for everyone in our libraries like PyTorch-Transformers 👾 or PyTorch-pretrained-BigGAN 🦋

In this post, you’ll learn the main recipe to convert a pretrained TensorFlow model in a pretrained PyTorch model, in just a few hours.

We’ll take the example of a simple architecture like OpenAI GPT-2 🦄

Doing such a conversion assumes a good familiarity with both TensorFlow and PyTorch but it’s also one of the best ways to get to know better both frameworks!

Looking at the scope structure 🔎

By Mahir Uysal

A few years ago, creating a chatbot -as limited as they were back then- could take months 🗓, from designing the rules to actually writing thousands of answers to cover some of the conversation topics.

With the recent progress in deep-learning for NLP, we can now get rid of this petty work and build much more powerful conversational AI 🌟 in just a matter of hours 🍃 as you will see in this tutorial.

We’ve set up a demo running the pretrained model we’ll build together in this tutorial at Be sure to check it out! 🎮

By David Marcu

I’ve spent most of 2018 training neural networks that tackle the limits of my GPUs. Whether it was a 150 millions parameters language model like OpenAI’s huge Generative Pre-trained Transformer (or the recent and similar BERT model) or a meta-learning neural net fed with 30 million element inputs like the one of our ICLR ‘18 paper, I could barely fit more than a few training samples on a GPU.

But most of the time stochastic gradient descent algorithms require larger batches than just a handful of examples to get decent results.

How can you train your model on large batches…

Krabi, Thailand by Mo Baghdadi

In which Twitter talked about meaning, semantics, language models, learning Thai and Java, entailment, co-reference — all in one fascinating thread.

Last week a tweet by Jacob Andreas triggered a huge discussion on Twitter that many people have called the meaning/semantics mega-thread.

Twitter is a great medium for having such a discussion, replying to any comment allows to revive the debate from the most promising point when it’s stuck in a dead-end.

Unfortunately Twitter also makes the discussion very hard to read afterwards so I made three entry points to explore this fascinating mega-thread:

  1. a summary of the discussion that you will find below,
  2. an interactive view to explore the trees of tweets, and
  3. a commented map to get an overview…

SpaceX Falcon Heavy Launch — Credit SpaceX

How to take advantage of spaCy & a bit of Cython for blazing fast NLP

I also published a Jupyter notebook with the examples I describe in this post.

When we published our Python coreference resolution package✨ last year, we got an amazing feedback from the community and people started to use it for many applications 📚, some very different from our original dialog use-case 👥.

And we discovered that, while the speed was totally fine for dialog messages, it could be really slow 🐌 on larger news articles.

I decided to investigate this in details and the result is NeuralCoref v3.0 which is about 100 times faster 🚀 than the previous version (several thousands…

John Christian Fjellestad –Distant road

A Chinese version of this article can be found here, thanks to Jakukyo.

Word and sentence embeddings have become an essential part of any Deep-Learning-based natural language processing systems.

They encode words and sentences 📜 in fixed-length dense vectors 📐 to drastically improve the processing of textual data.

A huge trend is the quest for Universal Embeddings: embeddings that are pre-trained on a large corpus and can be plugged in a variety of downstream task models (sentimental analysis, classification, translation…) to automatically improve their performance by incorporating some general word/sentence representations learned on the larger dataset.

It’s a form of…

Meta-learning is an exciting trend of research in the machine-learning community which tackles the problem of learning to learn.

The traditional paradigm in machine learning research is to get a huge dataset on a specific task, and train a model from scratch using this dataset. Obviously that’s very far from how humans leverage past experiences to learn very quickly a new task from only a handset of examples.

That’s because humans learn to learn [1].

Over the last months, I have been playing and experimenting quite a lot with meta-learning models for Natural Language Processing and will be presenting some…

Links: Online demo Github repo: and our previous Medium post.

The last months have been quite intense at HuggingFace 🤗 with crazy usage growth 🚀 and everybody hard at work to keep up with it 🏇, but we finally managed to free some time and update our open-source library ✨Neuralcoref while publishing the training code at the same time.

Since we launched v1 last summer, more than ten million 💯 coreferences have been resolved on Hugging Face. Also, we are stoked that our library is now used in production by a few other companies and some really smart researchers…

Introducing torchMoji, a PyTorch implementation of DeepMoji

Detecting emotions, sentiments & sarcasm is a critical element of our natural language understanding pipeline at HuggingFace 🤗. Recently, we have switched to an integrated system based on a NLP model from the MIT Media Lab.

Update: We’ve open sourced it! Repo on GitHub

The model was initially designed in TensorFlow/Theano/Keras, and we ported it to pyTorch. Compared to Keras, pyTorch gives us more freedom to develop and test custom neural network modules and uses an easy to read numpy-style code. In this post, I will detail several interesting points that arose during the reimplementation:

  • how to make a custom…

TL;DR, Links: Online demo at, Github repo for Neuralcoref:

At Hugging Face 🤗 we work on the most amazing and challenging subset of natural language: millennial language. Full of uncertainties 🤔, implicit references 👾, emojis 😭, jokes 😂 and constantly creating novel expressions…

To navigate these stormy waters 🚤, we have developed a number of specific NLP tools based on the latest research in the field. One of these tools is a coreference system we use to keep track of short term references already at the front end of our AI 🤗 brains.

We couldn’t find a tool…

Thomas Wolf

Natural Language Processing, Deep learning and Computational Linguistics – Science Lead @Huggingface |

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store