Simple Chatbot using BERT and Pytorch: Part 1

Published in

Geek Culture

4 min readJun 27, 2021

Artificial Intelligence is rapidly getting into the workflow of many businesses across various industries. Due to the advancements in Natural Language Processing (NLP), Natural Language Understanding (NLU), and Deep Learning (DL), we are now able to develop technologies capable of imitating human-like interactions which include recognizing speech, as well as text.

In this article, we are going to build a Chatbot using Transformer and Pytorch.

I have divided the article into three parts.

Part(1/3): Brief introduction and Installation

Part(2/3): Data Preparation

Part(3/3): Fine-tuning of the model

Transformer

Google introduced the transformer architecture in the paper “Attention is All you need”. The transformer uses a self-attention mechanism, which is suitable for language understanding.

Let’s say “I went to the Himalayas this summer. I really enjoyed my time out there”. The last word “there” refers to the Himalayas. But to understand this, remembering the first few parts is essential. To achieve this, the attention mechanism decides at each step of an input sequence which other parts of the sequence are important.

The transformer has an encoder-decoder architecture. They are composed of modules that contain feed-forward and attention layers.

BERT(Bidirectional Encoder Representations from Transformers)

It is a transformer-based machine learning technique for natural language processing pre-training developed by Google. BERT was created and published in 2018 by Jacob Devlin and his colleagues from Google.

BERT uses bidirectional training i.e it reads the sentence from both directions to understand the context of the sentence.

Note that BERT is just an encoder. It does not have a decoder.

Parameter counts of several recently released pre-trained language models.

Pytorch:

PyTorch is a Python-based scientific computing package that uses the power of graphics processing units(GPU). Since its release in January 2016, many researchers have continued to increasingly adopt PyTorch. It has quickly become a go-to library because of its ease in building extremely complex neural networks. It is giving a tough competition to TensorFlow especially when used for research work.

Some of the key highlights of PyTorch includes:

Simple Interface: It offers easy to use API.

Pythonic in nature: This library, being Pythonic, smoothly integrates with the Python data science stack.

Tensors: It is basically the same as a NumPy array. To run operations on the GPU, just cast the Tensor to a Cuda datatype.

Computational graphs: PyTorch provides an excellent platform that offers dynamic computational graphs.

AUTOGRAD(Automatic Differentiation): This class is an engine to calculate derivatives.

The Data

As a first step, we need to set up an intents JSON file that defines the intentions of the chatbot user.
For example:
A user may wish to know the name of our chatbot; therefore, we have created an intent called name.
A user may wish to know the age of our chatbot; therefore, we have created an intent called age.

In this chatbot, we have used 5 intents: name, age, date, greeting, and goodbye. We have used the training set that has utterances belonging to each of these intents. When the user enters any input, the intent will be recognized by the bot.

Within this intents JSON file, alongside each intents tag, there are responses. For our chatbot, once the intent is recognized the response will be randomly selected from the static set of responses associated with each intent.

# used a dictionary to represent an intents JSON filedata = {"intents": [{"tag": "greeting",
 "responses": ["Howdy Partner!", "Hello", "How are you doing?",   "Greetings!", "How do you do?"]},{"tag": "age",
 "responses": ["I am 25 years old", "I was born in 1998", "My birthday is July 3rd and I was born in 1998", "03/07/1998"]},{"tag": "date",
 "responses": ["I am available all week", "I don't have any plans",  "I am not busy"]},{"tag": "name",
 "responses": ["My name is James", "I'm James", "James"]},{"tag": "goodbye",
 "responses": ["It was nice speaking to you", "See you later", "Speak soon!"]}
]}

Installation of Packages

Transformers: This library brings together over 40 state-of-the-art pre-trained NLP models (BERT, GPT-2, Roberta, etc..)

Torchinfo: To print the model architecture.

# Install Transformers
!pip install transformers==3# To get model summary
!pip install torchinfo

Import Libraries

Importing the libraries that are required to perform operations on the dataset.

import numpy as np
import pandas as pd
import re
import torch
import random
import torch.nn as nn
import transformers
import matplotlib.pyplot as plt# specify GPU
device = torch.device(“cuda”)

Load Dataset

We load the training dataset here

# We have prepared a chitchat dataset with 5 labels
df = pd.read_excel(“/content/drive/MyDrive/Datasets/chitchat.xlsx”)df.head()

df[‘label’].value_counts()

To convert these categorical labels into numerical encodings we are using the LabelEncoder.

# Converting the labels into encodings
from sklearn.preprocessing import LabelEncoder
le = LabelEncoder()
df['label'] = le.fit_transform(df['label'])# check class distribution
df['label'].value_counts(normalize = True)

# In this example we have used all the utterances for training purpose
train_text, train_labels = df[‘text’], df[‘label’]