Transformers — An Intuition

Sharath S Hebbar
3 min readSep 25, 2023

Transformers came into the picture with the paper titled ‘Attention is All You Need.’

Transformers feature an encoder-decoder architecture that is used for Neural Machine Translation, which involves translating text from one language to another.

Attention is all you need

Various researchers have used these encoder-decoder models to construct different language models capable of performing various tasks.

Some models utilize only the encoder, some only the decoder, and others both the encoder and decoder parts.

Encoder Only Models

Encoder-only models, also known as Autoencoding Models, utilize only an encoder to perform tasks like Sentiment Analysis and Named Entity Recognition.

These models are capable of masking words within a sentence and then reconstructing the text while considering the context of the words to the left and right.

In this technique, certain words are masked and then generated using information from their adjacent words.

This approach provides a bidirectional context as it considers both the preceding and following words in the sentence.

Bi-directional Context

It is also referred to as Masked Language Modeling.

Examples of Masked Language Modeling or Encoder-Only Models include BERT, RoBERTa, and DistilBERT.

Code

Fill-Mask

from transformers import pipeline
task = "fill-mask"
model_name = "bert-base-uncased"
input_text = "What are [MASK] doing ?"
fill_mask_pipeline = pipeline(
task,
model = model_name)

fill_mask_pipeline(input_text)

Sentiment-Analysis

from transformers import pipeline
task = "sentiment-analysis"
model_name = "bert-base-uncased"
input_text = "The food in that hotel was not so good"
sentiment_analysis_pipeline = pipeline(
task,
model = model_name)

sentiment_analysis_pipeline(input_text)

Decoder Only Models

Decoder-only models, also known as Auto-Regressive Models, rely solely on a decoder to perform tasks like Text Generation.

Their primary objective is to predict the next token, considering the context of the words to the left within a sentence.

In this technique, the following word in the sequence is generated using information from its preceding words.

These models have a unidirectional context, as they take the previous word in the sentence into account while predicting the next word.

Uni-directional Context

It is also referred to as Causal Language Modeling.

Examples of Causal Language Modeling or Decoder-Only models include GPT, GPT-2, GPT-3, BLOOM, and PaLM.

Code

from transformers import pipeline
task = "text-generation"
model_name = "gpt2"
max_output_length = 30
num_of_return_sequences = 2
input_text = "Hello, I am"
text_generator = pipeline(
task,
model = model_name)

text_generator(
input_text,
max_length=max_output_length,
num_return_sequences=num_of_return_sequences)

Encoder-Decoder Models

Encoder-decoder models, also known as Sequence-to-Sequence Models, employ both an encoder and a decoder to perform tasks such as Machine Translation and Summarization.

These models utilize an auto-encoding method to mask words within a sentence and an auto-regressive method to reconstruct the masked words.

They combine the capabilities of both Auto-Encoding and Auto-Regressive methods to achieve their tasks, making use of both left-to-right and right-to-left contexts for predicting words through span corruption.

Span corruption involves deliberately corrupting a span of text in the input sequence and challenging the model to reconstruct the original sequence.

Span Corruption

They are also commonly referred to as Seq2Seq models.

Examples of Sequence-to-Sequence Models or Encoder-Decoder models include T5 and BART.

Code

from transformers import pipeline
task = "text2text-generation"
model_name = "google/flan-t5-base"
input_text = "Can you convert this text to French language: Hello, How are you"
text2text_generator = pipeline(
task,
model = model_name)

text2text_generator(input_text)

Reference:

https://github.com/SharathHebbar/Transformers

--

--