Simple Chatbot using BERT and Pytorch: Part 3

AI Brewery
Geek Culture
Published in
4 min readJun 27, 2021

This article has been divided into three parts.

Part(1/3): Brief introduction and Installation

Part(2/3): Data Preparation

Part(3/3): Fine-tuning of the model

In the last articles, we saw a brief introduction to the concepts of Transformer and Pytorch. We installed all the necessary libraries and prepared the data for the model training. Now let's fine-tune the model and see the results.

Optimizer

Using the Optimizer we reduce the loss during backpropagation through the network.

from transformers import AdamW# define the optimizer
optimizer = AdamW(model.parameters(), lr = 1e-3)

Find Class Weights

from sklearn.utils.class_weight import compute_class_weight#compute the class weights
class_wts = compute_class_weight(‘balanced’, np.unique(train_labels), train_labels)
print(class_wts)

Balancing the weights while calculating the error

# convert class weights to tensor
weights= torch.tensor(class_wts,dtype=torch.float)
weights = weights.to(device)
# loss function
cross_entropy = nn.NLLLoss(weight=weights)

Setting up the epochs

# empty lists to store training and validation loss of each epoch
train_losses=[]
# number of training epochs
epochs = 200
# We can also use learning rate scheduler to achieve better results
lr_sch = lr_scheduler.StepLR(optimizer, step_size=100, gamma=0.1)

Fine-Tune the model

# function to train the model
def train():

model.train()
total_loss = 0

# empty list to save model predictions
total_preds=[]

# iterate over batches
for step,batch in enumerate(train_dataloader):

# progress update after every 50 batches.
if step % 50 == 0 and not step == 0:
print(' Batch {:>5,} of {:>5,}.'.format(step, len(train_dataloader)))
# push the batch to gpu
batch = [r.to(device) for r in batch]
sent_id, mask, labels = batch
# get model predictions for the current batch
preds = model(sent_id, mask)
# compute the loss between actual and predicted values
loss = cross_entropy(preds, labels)
# add on to the total loss
total_loss = total_loss + loss.item()
# backward pass to calculate the gradients
loss.backward()
# clip the the gradients to 1.0. It helps in preventing the exploding gradient problem
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0)
# update parameters
optimizer.step()
# clear calculated gradients
optimizer.zero_grad()

# We are not using learning rate scheduler as of now
# lr_sch.step()
# model predictions are stored on GPU. So, push it to CPU
preds=preds.detach().cpu().numpy()
# append the model predictions
total_preds.append(preds)
# compute the training loss of the epoch
avg_loss = total_loss / len(train_dataloader)

# predictions are in the form of (no. of batches, size of batch, no. of classes).
# reshape the predictions in form of (number of samples, no. of classes)
total_preds = np.concatenate(total_preds, axis=0)
#returns the loss and predictions
return avg_loss, total_preds

Start Model Training

for epoch in range(epochs):

print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))

#train model
train_loss, _ = train()

# append training and validation loss
train_losses.append(train_loss)
# it can make your experiment reproducible, similar to set random seed to all options where there needs a random seed. torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
print(f'\nTraining Loss: {train_loss:.3f}')

The gradient loss curve

Get Predictions for Test Data

def get_prediction(str):
str = re.sub(r’[^a-zA-Z ]+’, ‘’, str)
test_text = [str]
model.eval()

tokens_test_data = tokenizer(
test_text,
max_length = max_seq_len,
pad_to_max_length=True,
truncation=True,
return_token_type_ids=False
)
test_seq = torch.tensor(tokens_test_data[‘input_ids’])
test_mask = torch.tensor(tokens_test_data[‘attention_mask’])

preds = None
with torch.no_grad():
preds = model(test_seq.to(device), test_mask.to(device))
preds = preds.detach().cpu().numpy()
preds = np.argmax(preds, axis = 1)
print(“Intent Identified: “, le.inverse_transform(preds)[0])
return le.inverse_transform(preds)[0]
def get_response(message):
intent = get_prediction(message)
for i in data['intents']:
if i["tag"] == intent:
result = random.choice(i["responses"])
break
print(f"Response : {result}")
return "Intent: "+ intent + '\n' + "Response: " + result

Let's test the model now:

get_response(“why dont you introduce yourself”)

For testing purposes, we deployed the model using Gradio.
Here are the results.

To achieve better results:
1. Experiment with different transformer models
2. Tune parameters such as max_seq_len, batch_size
3. Use a learning rate scheduler

--

--