Friends and users of our open-source tools are often surprised how fast 🚀 we reimplement the latest SOTA pre-trained TensorFlow models to make them accessible for everyone in our libraries like PyTorch-Transformers 👾 or PyTorch-pretrained-BigGAN 🦋
In this post, you’ll learn the main recipe to convert a pretrained TensorFlow model in a pretrained PyTorch model, in just a few hours.
We’ll take the example of a simple architecture like OpenAI GPT-2 🦄
Doing such a conversion assumes a good familiarity with both TensorFlow and PyTorch but it’s also one of the best ways to get to know better both frameworks!
A few years ago, creating a chatbot -as limited as they were back then- could take months 🗓, from designing the rules to actually writing thousands of answers to cover some of the conversation topics.
With the recent progress in deep-learning for NLP, we can now get rid of this petty work and build much more powerful conversational AI 🌟 in just a matter of hours 🍃 as you will see in this tutorial.
We’ve set up a demo running the pretrained model we’ll build together in this tutorial at convai.huggingface.co. Be sure to check it out! 🎮
I’ve spent most of 2018 training neural networks that tackle the limits of my GPUs. Whether it was a 150 millions parameters language model like OpenAI’s huge Generative Pre-trained Transformer (or the recent and similar BERT model) or a meta-learning neural net fed with 30 million element inputs like the one of our ICLR ‘18 paper, I could barely fit more than a few training samples on a GPU.
But most of the time stochastic gradient descent algorithms require larger batches than just a handful of examples to get decent results.
How can you train your model on large batches…
Last week a tweet by Jacob Andreas triggered a huge discussion on Twitter that many people have called the meaning/semantics mega-thread.
Twitter is a great medium for having such a discussion, replying to any comment allows to revive the debate from the most promising point when it’s stuck in a dead-end.
Unfortunately Twitter also makes the discussion very hard to read afterwards so I made three entry points to explore this fascinating mega-thread:
I also published a Jupyter notebook with the examples I describe in this post.
When we published our Python coreference resolution package✨ last year, we got an amazing feedback from the community and people started to use it for many applications 📚, some very different from our original dialog use-case 👥.
And we discovered that, while the speed was totally fine for dialog messages, it could be really slow 🐌 on larger news articles.
I decided to investigate this in details and the result is NeuralCoref v3.0 which is about 100 times faster 🚀 than the previous version (several thousands…
Word and sentence embeddings have become an essential part of any Deep-Learning-based natural language processing systems.
They encode words and sentences 📜 in fixed-length dense vectors 📐 to drastically improve the processing of textual data.
A huge trend is the quest for Universal Embeddings: embeddings that are pre-trained on a large corpus and can be plugged in a variety of downstream task models (sentimental analysis, classification, translation…) to automatically improve their performance by incorporating some general word/sentence representations learned on the larger dataset.
Meta-learning is an exciting trend of research in the machine-learning community which tackles the problem of learning to learn.
The traditional paradigm in machine learning research is to get a huge dataset on a specific task, and train a model from scratch using this dataset. Obviously that’s very far from how humans leverage past experiences to learn very quickly a new task from only a handset of examples.
That’s because humans learn to learn .
Over the last months, I have been playing and experimenting quite a lot with meta-learning models for Natural Language Processing and will be presenting some…
The last months have been quite intense at HuggingFace 🤗 with crazy usage growth 🚀 and everybody hard at work to keep up with it 🏇, but we finally managed to free some time and update our open-source library ✨Neuralcoref while publishing the training code at the same time.
Since we launched v1 last summer, more than ten million 💯 coreferences have been resolved on Hugging Face. Also, we are stoked that our library is now used in production by a few other companies and some really smart researchers…
Detecting emotions, sentiments & sarcasm is a critical element of our natural language understanding pipeline at HuggingFace 🤗. Recently, we have switched to an integrated system based on a NLP model from the MIT Media Lab.
Update: We’ve open sourced it! Repo on GitHub
The model was initially designed in TensorFlow/Theano/Keras, and we ported it to pyTorch. Compared to Keras, pyTorch gives us more freedom to develop and test custom neural network modules and uses an easy to read numpy-style code. In this post, I will detail several interesting points that arose during the reimplementation:
At Hugging Face 🤗 we work on the most amazing and challenging subset of natural language: millennial language. Full of uncertainties 🤔, implicit references 👾, emojis 😭, jokes 😂 and constantly creating novel expressions…
To navigate these stormy waters 🚤, we have developed a number of specific NLP tools based on the latest research in the field. One of these tools is a coreference system we use to keep track of short term references already at the front end of our AI 🤗 brains.