Simple Chatbot using BERT and Pytorch: Part 2

AI Brewery

Published in

Geek Culture

3 min readJun 27, 2021

This article has been divided into three parts.

Part(1/3): Brief introduction and Installation

Part(2/3): Data Preparation

Part(3/3): Fine-tuning of the model

In the last article, we saw a brief introduction to the concepts of Transformer and Pytorch. We have installed all the necessary libraries. Now let's dive into the next part i.e Data Preparation.

In this example, we tried training with Bert-base-uncased, Roberta-base and distilbert-base-uncased models.

For our training data, the distilbert-base-uncased model gave better results.

BERT Model

We can import the Bert model as below.

from transformers import AutoModel, BertTokenizerFast# Load the BERT tokenizer
tokenizer = BertTokenizerFast.from_pretrained(‘bert-base-uncased’)# Import BERT-base pretrained model
bert = AutoModel.from_pretrained(‘bert-base-uncased’)

Roberta Model

We can import the Roberta model as below.

from transformers import RobertaTokenizer, RobertaModel# Load the Roberta tokenizer
tokenizer = RobertaTokenizer.from_pretrained(‘roberta-base’)# Import Roberta pretrained model
bert = RobertaModel.from_pretrained(‘roberta-base’)

DistilBert Model

We can import the DistilBert model as below.

from transformers import DistilBertTokenizer, DistilBertModel# Load the DistilBert tokenizer
tokenizer = DistilBertTokenizer.from_pretrained(‘distilbert-base-uncased’)# Import the DistilBert pretrained model
bert = DistilBertModel.from_pretrained(“distilbert-base-uncased”)

We will be using DistilBert model for this example.

Sample data for distilbert-base-uncased tokenizer

text = ["this is a distil bert model.","data is oil"]# Encode the textencoded_input = tokenizer(text, padding=True,truncation=True, return_tensors='pt')print(encoded_input)In input_ids:
101 - Indicates beginning of the sentence
102 - Indicates end of the sentenceIn attention_mask:
1 - Actual token
0 - Padded token

# get length of all the messages in the train setseq_len = [len(i.split()) for i in train_text]pd.Series(seq_len).hist(bins = 10)# Based on the histogram we are selecting the max len as 8
max_seq_len = 8

# tokenize and encode sequences in the training settokens_train = tokenizer(
    train_text.tolist(),
    max_length = max_seq_len,
    pad_to_max_length=True,
    truncation=True,
    return_token_type_ids=False
)

Next, we will convert the integer sequences to tensors.

# for train settrain_seq = torch.tensor(tokens_train['input_ids'])
train_mask = torch.tensor(tokens_train['attention_mask'])
train_y = torch.tensor(train_labels.tolist())

Now we will create dataloaders for the training set. These dataloaders will pass batches of train data as input to the model during the training phase.

from torch.utils.data import TensorDataset, DataLoader, RandomSampler, SequentialSampler#define a batch size
batch_size = 16# wrap tensors
train_data = TensorDataset(train_seq, train_mask, train_y)# sampler for sampling the data during training
train_sampler = RandomSampler(train_data)# DataLoader for train set
train_dataloader = DataLoader(train_data, sampler=train_sampler, batch_size=batch_size)

Define Model Architecture

class BERT_Arch(nn.Module):   def __init__(self, bert):      
       super(BERT_Arch, self).__init__()       self.bert = bert 
      
       # dropout layer
       self.dropout = nn.Dropout(0.2)
      
       # relu activation function
       self.relu =  nn.ReLU()       # dense layer       self.fc1 = nn.Linear(768,512)       self.fc2 = nn.Linear(512,256)       self.fc3 = nn.Linear(256,5)       #softmax activation function
       self.softmax = nn.LogSoftmax(dim=1)       #define the forward pass
   def forward(self, sent_id, mask):      #pass the inputs to the model  
      cls_hs = self.bert(sent_id, attention_mask=mask)[0][:,0]
      
      x = self.fc1(cls_hs)
      x = self.relu(x)
      x = self.dropout(x)
      
      x = self.fc2(x)
      x = self.relu(x)
      x = self.dropout(x)      # output layer
      x = self.fc3(x)
   
      # apply softmax activation
      x = self.softmax(x)      return x

# freeze all the parameters. This will prevent updating of model weights during fine-tuning.for param in bert.parameters():
      param.requires_grad = Falsemodel = BERT_Arch(bert)# push the model to GPU
model = model.to(device)from torchinfo import summary
summary(model)

Simple Chatbot using BERT and Pytorch: Part 2

BERT Model

Roberta Model

DistilBert Model

Define Model Architecture

Click here to go to the next part: Part(3/3): Fine-tuning of the model

Written by AI Brewery