Simple Chatbot using BERT and Pytorch: Part 3
This article has been divided into three parts.
Part(1/3): Brief introduction and Installation
Part(2/3): Data Preparation
Part(3/3): Fine-tuning of the model
In the last articles, we saw a brief introduction to the concepts of Transformer and Pytorch. We installed all the necessary libraries and prepared the data for the model training. Now let's fine-tune the model and see the results.
Optimizer
Using the Optimizer we reduce the loss during backpropagation through the network.
from transformers import AdamW# define the optimizer
optimizer = AdamW(model.parameters(), lr = 1e-3)
Find Class Weights
from sklearn.utils.class_weight import compute_class_weight#compute the class weights
class_wts = compute_class_weight(‘balanced’, np.unique(train_labels), train_labels)print(class_wts)
Balancing the weights while calculating the error
# convert class weights to tensor
weights= torch.tensor(class_wts,dtype=torch.float)
weights = weights.to(device)# loss function
cross_entropy = nn.NLLLoss(weight=weights)
Setting up the epochs
# empty lists to store training and validation loss of each epoch
train_losses=[]# number of training epochs
epochs = 200# We can also use learning rate scheduler to achieve better results
lr_sch = lr_scheduler.StepLR(optimizer, step_size=100, gamma=0.1)
Fine-Tune the model
# function to train the model
def train():
model.train() total_loss = 0
# empty list to save model predictions
total_preds=[]
# iterate over batches
for step,batch in enumerate(train_dataloader):
# progress update after every 50 batches.
if step % 50 == 0 and not step == 0:
print(' Batch {:>5,} of {:>5,}.'.format(step, len(train_dataloader))) # push the batch to gpu
batch = [r.to(device) for r in batch]
sent_id, mask, labels = batch # get model predictions for the current batch
preds = model(sent_id, mask) # compute the loss between actual and predicted values
loss = cross_entropy(preds, labels) # add on to the total loss
total_loss = total_loss + loss.item() # backward pass to calculate the gradients
loss.backward() # clip the the gradients to 1.0. It helps in preventing the exploding gradient problem
torch.nn.utils.clip_grad_norm_(model.parameters(), 1.0) # update parameters
optimizer.step() # clear calculated gradients
optimizer.zero_grad()
# We are not using learning rate scheduler as of now
# lr_sch.step() # model predictions are stored on GPU. So, push it to CPU
preds=preds.detach().cpu().numpy() # append the model predictions
total_preds.append(preds)# compute the training loss of the epoch
avg_loss = total_loss / len(train_dataloader)
# predictions are in the form of (no. of batches, size of batch, no. of classes).
# reshape the predictions in form of (number of samples, no. of classes)
total_preds = np.concatenate(total_preds, axis=0)#returns the loss and predictions
return avg_loss, total_preds
Start Model Training
for epoch in range(epochs):
print('\n Epoch {:} / {:}'.format(epoch + 1, epochs))
#train model
train_loss, _ = train()
# append training and validation loss
train_losses.append(train_loss) # it can make your experiment reproducible, similar to set random seed to all options where there needs a random seed. torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = Falseprint(f'\nTraining Loss: {train_loss:.3f}')
The gradient loss curve
Get Predictions for Test Data
def get_prediction(str):
str = re.sub(r’[^a-zA-Z ]+’, ‘’, str)
test_text = [str]
model.eval()
tokens_test_data = tokenizer(
test_text,
max_length = max_seq_len,
pad_to_max_length=True,
truncation=True,
return_token_type_ids=False
) test_seq = torch.tensor(tokens_test_data[‘input_ids’])
test_mask = torch.tensor(tokens_test_data[‘attention_mask’])
preds = None with torch.no_grad():
preds = model(test_seq.to(device), test_mask.to(device)) preds = preds.detach().cpu().numpy()
preds = np.argmax(preds, axis = 1)
print(“Intent Identified: “, le.inverse_transform(preds)[0])
return le.inverse_transform(preds)[0]def get_response(message):
intent = get_prediction(message)
for i in data['intents']:
if i["tag"] == intent:
result = random.choice(i["responses"])
break
print(f"Response : {result}")
return "Intent: "+ intent + '\n' + "Response: " + result
Let's test the model now:
get_response(“why dont you introduce yourself”)
For testing purposes, we deployed the model using Gradio.
Here are the results.
To achieve better results:
1. Experiment with different transformer models
2. Tune parameters such as max_seq_len, batch_size
3. Use a learning rate scheduler