Simple Chatbot using BERT and Pytorch: Part 2
This article has been divided into three parts.
Part(1/3): Brief introduction and Installation
Part(2/3): Data Preparation
Part(3/3): Fine-tuning of the model
In the last article, we saw a brief introduction to the concepts of Transformer and Pytorch. We have installed all the necessary libraries. Now let's dive into the next part i.e Data Preparation.
In this example, we tried training with Bert-base-uncased, Roberta-base and distilbert-base-uncased models.
For our training data, the distilbert-base-uncased model gave better results.
BERT Model
We can import the Bert model as below.
from transformers import AutoModel, BertTokenizerFast# Load the BERT tokenizer
tokenizer = BertTokenizerFast.from_pretrained(‘bert-base-uncased’)# Import BERT-base pretrained model
bert = AutoModel.from_pretrained(‘bert-base-uncased’)
Roberta Model
We can import the Roberta model as below.
from transformers import RobertaTokenizer, RobertaModel# Load the Roberta tokenizer
tokenizer = RobertaTokenizer.from_pretrained(‘roberta-base’)# Import Roberta pretrained model
bert = RobertaModel.from_pretrained(‘roberta-base’)
DistilBert Model
We can import the DistilBert model as below.
from transformers import DistilBertTokenizer, DistilBertModel# Load the DistilBert tokenizer
tokenizer = DistilBertTokenizer.from_pretrained(‘distilbert-base-uncased’)# Import the DistilBert pretrained model
bert = DistilBertModel.from_pretrained(“distilbert-base-uncased”)
We will be using DistilBert model for this example.
Sample data for distilbert-base-uncased tokenizer
text = ["this is a distil bert model.","data is oil"]# Encode the textencoded_input = tokenizer(text, padding=True,truncation=True, return_tensors='pt')print(encoded_input)In input_ids:
101 - Indicates beginning of the sentence
102 - Indicates end of the sentenceIn attention_mask:
1 - Actual token
0 - Padded token
# get length of all the messages in the train setseq_len = [len(i.split()) for i in train_text]pd.Series(seq_len).hist(bins = 10)# Based on the histogram we are selecting the max len as 8
max_seq_len = 8
# tokenize and encode sequences in the training settokens_train = tokenizer(
train_text.tolist(),
max_length = max_seq_len,
pad_to_max_length=True,
truncation=True,
return_token_type_ids=False
)
Next, we will convert the integer sequences to tensors.
# for train settrain_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())
Now we will create dataloaders for the training set. These dataloaders will pass batches of train data as input to the model during the training phase.
from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler#define a batch size
batch_size = 16# wrap tensors
train_data = TensorDataset(train_seq, train_mask, train_y)# sampler for sampling the data during training
train_sampler = RandomSampler(train_data)# DataLoader for train set
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)
Define Model Architecture
class BERT_Arch(nn.Module): def __init__(self, bert):
super(BERT_Arch, self).__init__() self.bert = bert
# dropout layer
self.dropout = nn.Dropout(0.2)
# relu activation function
self.relu = nn.ReLU() # dense layer self.fc1 = nn.Linear(768,512) self.fc2 = nn.Linear(512,256) self.fc3 = nn.Linear(256,5) #softmax activation function
self.softmax = nn.LogSoftmax(dim=1) #define the forward pass
def forward(self, sent_id, mask): #pass the inputs to the model
cls_hs = self.bert(sent_id, attention_mask=mask)[0][:,0]
x = self.fc1(cls_hs)
x = self.relu(x)
x = self.dropout(x)
x = self.fc2(x)
x = self.relu(x)
x = self.dropout(x) # output layer
x = self.fc3(x)
# apply softmax activation
x = self.softmax(x) return x
# freeze all the parameters. This will prevent updating of model weights during fine-tuning.for param in bert.parameters():
param.requires_grad = Falsemodel = BERT_Arch(bert)# push the model to GPU
model = model.to(device)from torchinfo import summary
summary(model)