Thilina Rajapakse
AI researcher, avid reader, fantasy and Sci-Fi geek, and fan of the Oxford comma. On the hunt for an interesting PhD! www.linkedin.com/in/t-rajapakse/

Paraphrasing is the act of expressing something using different words while retaining the original meaning. Let’s see how we can do it with BART, a Sequence-to-Sequence Transformer Model.

Image for post
Image for post
Photo by Alexandra on Unsplash

Introduction

BART is a denoising autoencoder for pretraining sequence-to-sequence models. BART is trained by (1) corrupting text with an arbitrary noising function, and (2) learning a model to reconstruct the original text.

- BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension -

Don’t worry if that sounds a little complicated; we are going to break it down and see what it all means. To add a little bit of background before we dive into BART, it’s time for the now-customary ode to Transfer Learning with self-supervised models. …


How to tune your hyperparameters with Simple Transformers for better Natural Langauge Processing.

Image for post
Image for post
Photo by Glenn Hansen on Unsplash

The goal of any Deep Learning model is to take in an input and generate the correct output. The nature of these inputs and outputs, which can vary wildly from application to application, depends on the specific job that the model should perform. …


The T5 Transformer can perform any NLP task. It can perform multiple tasks, at the same time, with the same model. Here’s how!

Image for post
Image for post
Photo by Matt Bero on Unsplash

The T5 (Text-To-Text Transfer Transformer) model was the product of a large-scale study (paper) conducted to explore the limits of transfer learning. It builds upon popular architectures like GPT, BERT, and RoBERTa(to name only a few) models that utilized Transfer Learning with incredible success. While BERT-like models can be fine-tuned to perform a variety of tasks, the constraints of the architecture mean that each model can perform only one task.

Typically, this is done by adding a task-specific layer on top of the Transformer model. For example, a BERT Transformer can be adapted for binary classification by adding a fully-connected layer with two output neurons (corresponding to each class). The T5 model departs from this tradition by reframing all NLP tasks as text-to-text tasks. This results in a shared framework for any NLP task as the input to the model and the output from the model is always a string. …


The T5 Transformer frames any NLP task as a text-to-text task enabling pre-trained models to easily learn new tasks. Let’s teach the old dog a new trick!

Image for post
Image for post
Image by Katrin B. from Pixabay

I’ve been itching to try the T5 (Text-To-Text Transfer Transformer) ever since it came out way, way back in October 2019 (it’s been a long couple of months). I messed around with open-sourced code from Google a couple of times, but I never managed to get it to work properly. Some of it went a little over my head (Tensorflow 😫 ) so I figured I’ll wait for Hugging Face to ride to the rescue! As always, the Transformers implementation is much easier to work with and I adapted it for use with Simple Transformers.

Before we get to the good stuff, a quick word on what the T5 model is and why it’s so exciting. According to the article on T5 in the Google AI Blog, the model is a result of a large-scale study (paper link) on transfer learning techniques to see which works best. The T5 model was pre-trained on C4 (Colossal Clean Crawled Corpus), a new, absolutely massive dataset, released along with the model. …


ELECTRA is the new kid on the block. Let’s take a look at how it stacks up against the old guard!

Image for post
Image for post
Image by 272447 from Pixabay

One of the “secrets” behind the success of Transformer models is the technique of Transfer Learning. In Transfer Learning, a model (in our case, a Transformer model) is pre-trained on a gigantic dataset using an unsupervised pre-training objective. This same model is then fine-tuned (typically supervised training) on the actual task at hand. …


A guide on language generation and fine-tuning language generation Transformer models with Simple Transformers. It’s easier than you think!

Image for post
Image for post
Photo by Art Lasovsky on Unsplash

Transformer models are now state-of-the-art in most, if not all, Natural Language Processing tasks. Personally, I find language generation to be one of the most intriguing out of the myriad NLP tasks. There’s almost something human in being able to generate text that is not only grammatically correct but also cohesive and meaningful.

Transformers have risen admirably to the challenge of language generation with many models capable of generating impressive sequences of text. Out of these, the GPT-2 model, released over a year ago by Open AI, remains one of the best at language generation.

GPT-2 is a large transformer-based language model trained using the simple task of predicting the next word in 40GB of high-quality text from the internet. This simple objective proves sufficient to train the model to learn a variety of tasks due to the diversity of the dataset. In addition to its incredible language generation capabilities, it is also capable of performing tasks like question answering, reading comprehension, summarization, and translation. While GPT-2 does not beat the state-of-the-art in these tasks, its performance is impressive nonetheless considering that the model learns these tasks from raw text only. …


How does a Transformer Model learn a language? What’s new in ELECTRA? How do you train your own language model on a single GPU? Let’s find out!

Image for post
Image for post
https://pixabay.com/photos/woman-studying-learning-books-1852907/#

Contents

  • Introduction
  • What Is Pre-Training?
  • Masked Language Modelling (MLM)
  • ELECTRA Pre-Training Approach
  • Efficiency Gains of the ELECTRA Approach
  • Training Your Own ELECTRA Model
  • Installation
  • Data preparation
  • Language Modeling Model
  • Training the model
  • Throwing out the generator and getting the discriminator
  • Fine-tuning the pre-trained model
  • Wrap-up

Introduction

The process of training a Transformer model for use in a particular Natural Language Processing task is fairly simple, although it might not be easy. Start with a randomly initialized Transformer model, put together a huge (and I do mean huge) dataset containing text in the language or languages you are interested in, pre-train the Transformer on the huge dataset, and fine-tune the pre-trained Transformer on your particular task, using your task-specific dataset (which may be comparatively tiny). …


Supercharging pre-trained Transformer models to deal with datasets containing domain-specific language.

Image for post
Image for post
Photo by Josh Riemer on Unsplash

Transfer Learning

Broadly speaking, Transfer Learning is the idea of taking the knowledge gained from performing some task and applying it towards performing a different (but related) task. Transformer models, currently the undisputed masters of Natural Language Processing (NLP), rely on this technique to achieve their lofty state-of-the-art benchmarks.

Transformer models are first trained on huge (and I mean huge) amounts of text in a step called “pre-training”. During this step, the models are expected to learn the words, grammar, structure, and other linguistic features of the language. The text is represented by tokens each of which has its own unique id. The collection of all such tokens is referred to as the vocabulary of the model. All the actual words in the text are iteratively split into pieces until the entire text consists only of tokens which are present in the vocabulary. …


A quick look at Conversational AI, why it’s useful, and how things stand.

Image for post
Image for post
Photo by Andre Hunter on Unsplash

Why Chatbots?

That call really could have been a message. I’m sure I’m not the only person who’s thought this after receiving (or having to make) a phone call. …


Creating an amazing Conversational AI doesn’t have to be hard, and it certainly doesn’t need to take months! Transfer Learning is the way.

Image for post
Image for post
Photo by Bewakoof.com Official on Unsplash

Introduction

Chatbots and virtual assistants, once found mostly in Sci-Fi, are becoming increasingly more common. Google Assistant’s and Siri’s of today still has a long, long way to go to reach Iron Man’s J.A.R.V.I.S. and the like, but the journey has begun. While the current crop of Conversational AI is far from perfect, they are also a far cry from their humble beginnings as simple programs like ELIZA.

Moving away from the typical rule-based chatbots, Hugging Face came up with a Transformer based way to build chatbots that lets us leverage the state-of-the-art language modelling capabilities of models like BERT and OpenAI GPT. Using this method, we can quickly build powerful and impressive Conversational AI’s that can outperform most rule-based chatbots. …


Transformers are the undisputed kings of Natural Language Processing. But with so many different models around it can be tough to choose just one. Hopefully, this will help!

Image for post
Image for post
Photo by Victoriano Izquierdo on Unsplash

This is becoming a bit of a cliche, but Transformer models have transformed Natural Language Processing. …


Let’s face it; Processing documents is tedious and paperwork is boring. Computer vision can help us do less of it!

Image for post
Image for post
Photo by Viktor Talashuk on Unsplash

Confession; I have terrible handwriting. I attribute part of the blame to my childhood where I learned to write cursive in England and then had to switch back to non-cursive in Sri Lanka where cursive is uncommon. It doesn’t help that my ADHD tends to make my writing hurried and careless as well. I like to joke that my brain moves too fast for my hand to keep up with whenever someone complains. 😉

The field of computer vision has seen tremendous development in the recent past leading to a host of practical applications in a wide variety of industries and use cases. Optical Character Recognition (OCR) is one such application of computer vision with the potential to automate many tedious but necessary tasks. OCR technology can be used to process digital documents (PDFs, scanned documents, images of documents and the like), far more efficiently than humans can. In a nutshell, OCR can “read” a document and convert images of text into actual text. Current state-of-the-art algorithms are capable of near-flawless recognition of printed text, with handwriting recognition not too far behind (as long as the handwriting isn’t somewhere between a child’s scrawl and a doctor’s note like mine). …


Want to use Transformers for Sentence Pair NLP tasks? Simple Transformers has you covered!

Image for post
Image for post
Photo by David Marcu on Unsplash

Preface

The Simple Transformers library is built on top of the excellent Transformers library by Hugging Face with the goal of making Transformer models quick and easy to use.

Introduction

Sentence pair tasks, as the name suggests, are Natural Language Processing (NLP) tasks where the input features consist of two pieces of text (not necessarily grammatical sentences). Textual entailment and semantic similarity are a couple of examples for such tasks.

Considering the unprecedented success of BERT and other Transformer models in many NLP tasks, it should come as no surprise that they excel at sentence pair tasks as well. …


Machine Learning models are getting smarter and we need to get smarter at training them. This is how we do it!

Image for post
Image for post
Photo by William Iven on Unsplash

This is going to sound paradoxical, but I feel that Artificial Intelligence is both overhyped and underhyped. …


Enabling connection through the power of Conversational AI.

Image for post
Image for post
Photo by Perry Grone on Unsplash

Stories are what make us human. Every culture in every era has had its share of stories. From the campfires of our ancestors, taking the first tentative steps in the long journey of humanity, to the giant cities built by modern humans as the undisputed masters of our world, stories have been with us, connecting us, every single step of the way.

Telling and listening to stories is something that all of us have experienced. While we have developed many forms of communication, from art to writing to movies, talking is still the most natural and intimate form nonetheless. …


Question: How to use Transformers for Question Answering? Answer: Simple Transformers, duh! (See what I did there?)

Image for post
Image for post
Photo by Camylla Battani on Unsplash

Question Answering in NLP

Context: Question answering (QA) is a computer science discipline within the fields of information retrieval and natural language processing (NLP), which is concerned with building systems that automatically answer questions posed by humans in a natural language.

Human: What is a Question Answering system?
System: systems that automatically answer questions posed by humans in a natural language

QA has applications in a vast array of tasks including information retrieval, entity extraction, chatbots, and dialogue systems to name but a few. While question answering can be done in various ways, perhaps the most common flavour of QA is selecting the answer from a given context. …


Learn how to use Transformer Models to perform Multi-Label Classification in just 3 lines of code with Simple Transformers.

Image for post
Image for post
Photo by russn_fckr on Unsplash

Preface

The Simple Transformers library is built on top of the excellent Transformers library by Hugging Face. You guys are incredible!

Simple Transformers now supports:

There’s plenty more in the pipeline.

Introduction

Transformer models and Transfer Learning methods continue to propel the field of Natural Language Processing forward at a tremendous pace. However, state-of-the-art performance too often comes at the price of tons of (complex) code.

Simple Transformers avoids all the complexity and lets you get down to what matters, training and using Transformer models. Bypass all the complicated setups, boilerplates, and other general unpleasantness to initialize a model in one line, train in the next, and evaluate with the third. …


Simple Transformers is the “it just works” Transformer library. Use Transformer models for Named Entity Recognition with just 3 lines of code. Yes, really.

Image for post
Image for post
Photo by Brandi Redd on Unsplash

Preface

The Simple Transformers library was conceived to make Transformer models easy to use. Transformers are incredibly powerful (not to mention huge) deep learning models which have been hugely successful at tackling a wide variety of Natural Language Processing tasks. Simple Transformers enabled the application of Transformer models to Sequence Classification tasks (binary classification initially, but with multiclass classification added soon after) with only three lines of code.

I am delighted to announce that Simple Transformers now supports Named Entity Recognition, another common NLP task, alongside Sequence Classification.

Links to other capabilities:


Simple Transformers is the “it just works” Transformer library. If you are looking to use Transformers for real applications (in 3 lines of code), without worrying about the technical details, this is for you.

Image for post
Image for post

Preface

The Simple Transformers library is built on top of the excellent Transformers library by Hugging Face. The Hugging Face Transformers library is the library for researchers and other people who need extensive control over how things are done. It is also the best choice when you need to stray off the beaten path, do things differently, or do new things altogether. Simple Transformers is, well, a lot simpler.

Introduction

Simple Transformers is designed for when you need to get something done and you want it done now. No mucking about with source code, no hours of hair-pulling while trying to figure out how to even set the damn thing up.


Want to use Transformer models for NLP? Pages of code got you down? Not anymore because Simple Transformers is on the job. Start, train, and evaluate Transformers with just 3 lines of code!

Image for post
Image for post

Preface

The Simple Transformers library is built as a wrapper around the excellent Transformers library by Hugging Face. I am eternally grateful for the hard work done by the folks at Hugging Face to enable the public to easily access and use Transformer models. I don’t know what I’d have done without you guys!

Introduction

I believe it’s fair to say that the success of Transformer models have been nothing short of phenomenal in advancing the field of Natural Language Processing. Not only have they shown staggering leaps in performance on many NLP tasks they were designed to solve, pre-trained Transformers are also almost uncannily good at Transfer Learning. This means that anyone can take advantage of the long hours and the mind-boggling computational power that has gone into training these models to perform a countless variety of NLP tasks. …


A step-by-step tutorial on using Transformer Models for Text Classification tasks. Learn how to load, fine-tune, and evaluate text classification tasks with the Pytorch-Transformers library. Includes ready-to-use code for BERT, XLNet, XLM, and RoBERTa models.

Image for post
Image for post
Photo by Arseny Togulev on Unsplash

Update Notice

Please consider using the Simple Transformers library as it is easy to use, feature-packed, and regularly updated. The article still stands as a reference to BERT models and is likely to be helpful with understanding how BERT works. However, Simple Transformers offers a lot more features, much more straightforward tuning options, all the while being quick and easy to use! The links below should help you get started quickly.

  1. Binary Classification
  2. Multi-Class Classification
  3. Multi-Label Classification
  4. Named Entity Recognition (Part-of-Speech Tagging)
  5. Question Answering
  6. Sentence-Pair Tasks and Regression
  7. Conversational AI
  8. Language Model Fine-Tuning
  9. ELECTRA and Language Model Training from Scratch
  10. Visualising Model…


Programming, Python

This guide aims to explain why multi-threading and multi-processing are needed in Python, when to use one over the other, and how to use them in your programs. As an AI researcher, I use them extensively when preparing data for my models!

Image for post
Image for post
Image by Parker_West from Pixabay

A long time ago in a galaxy far, far away….

A wise and powerful wizard lives in a small village in the middle of nowhere. …


The A-to-Z guide on how you can use Google’s BERT for binary text classification tasks. I’ll be aiming to explain, as simply and straightforwardly as possible, how to fine-tune a BERT model (with PyTorch) and use it for a binary text classification task.

Image for post
Image for post
Photo by Andy Kelly on Unsplash

Update Notice II

Please consider using the Simple Transformers library as it is easy to use, feature-packed, and regularly updated. The article still stands as a reference to BERT models and is likely to be helpful with understanding how BERT works. However, Simple Transformers offers a lot more features, much more straightforward tuning options, all the while being quick and easy to use! The links below should help you get started quickly.

  1. Binary Classification
  2. Multi-Class Classification
  3. Multi-Label Classification
  4. Named Entity Recognition (Part-of-Speech Tagging)
  5. Question Answering
  6. Sentence-Pair Tasks and Regression
  7. Conversational AI
  8. Language Model Fine-Tuning
  9. ELECTRA and Language Model Training from Scratch
  10. Visualising Model…

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store