Build Custom Chatbot in Python (1)

7 min readSep 13, 2023

custom chatbot series:

Quick Overview

In this article, we’ll show you how to build the intent function in Botpress and Voiceflow from scratch, by training a machine learning model in Python and Pytorch. It’s shown that if you know python, machine learning models and PyTorch library, you could easily build your own chatbot, rather than paying to Botpress by messages.

Github repo: https://github.com/michelle-w-br/chatbot_ml

Note: this blog needs understanding of python, machine learning, and pytorch basics as prerequisite.

What is Intent in Botpress and Voiceflow

Intent is a common feature in today’s chatbot design tools, such as Botpress and Voiceflow. There are many tutorials about how to use Botpress and Voiceflow to design chatbots with “intent” directing dialog. However, with these tools, you’re also constrained with the their pattern matching algorithm for intent and it is hard to debug if the predicted intent is wrong. For example, Botpress uses hashing algorithm for intent prediction. Below is the snippet for intent function (using hashing training algorithm) from Botpress V12 github sourcecode:

  public startTraining = async (language: string): Promise<ModelEntry> => {
    const trainSet = await this._getTrainSet(language)
    const modelId = await this._nluClient.startTraining(this._botId, trainSet)
    const definitionHash = this._hashTrainSet(trainSet)
    const entry: ModelEntry = { botId: this._botId, language, modelId, definitionHash }
    await this._trainings.set(entry)
    return entry
  }

Intent function, in simple words, is that given a user input, it is trying to predict if it has particular intent label predefined in the intent group. A Botpress intent example is shown as below. In “Intents”, we predefined some intent labels, e.g. “go_gym”, “go_swim”, “greeting”, and “speak_to_human”, so that intent can be used in the workflow to direct the workflow as shown in Figure 1. For each intent, developer can define its own sentence patterns, e.g. patterns for “speak_to_human” are shown in Figure 2.

Figure 1: workflow with predefined intent labels: go_gym, go_swim, greeting, speak_to_human

intent “speak_to_human” with its input patterns — Figure 2: intent “speak_to_human” and its input patterns

Intent by Machine Learning Models

We could build intent function (in Botpress and Voiceflow) from scratch by using machine learning models from Pytorch library. In our Github repo, we demonstrated two different machine learning models for intent, v1.0 and v2.0. Before diving deep into each model in detail. We’ll first explain the overall intent classification pipeline in machine learning.

Intent Classification Pipeline

In machine learning, this is a classical classification problem. The problem can be described as, given below training data, how does it classify a new input, e.g. “I wanna swim”, to a particular intent label?

The overall maching learning pipeline can be described in below steps:

load text data
using tokenization model to convert text data into vector representation
prepare PyTorch dataset
load dataset using PyTorch Dataloader
define classificaiton model
training the classification model
test/inference on new input

Out of these steps, step 2 and step 3 are most important steps that determin the machine learning model performances. v1.0 and v2.0 are also mainly different on step 2 and step 3. Simply speaking, v1.0 uses Bag of Words model for step 2, and nerual network for step 3. v2.0 uses Torchtext tokenizer for step 2 and embedding model for step 3.

Next, we’ll mainly focus on models in step 2 and step 3 for v1.0 and v2.0.

v1.0: Bag of words + neural network

Bag of Words (BoW) can be used to convert any text sequence into vector representation. If you have a training text data, then you could build a vocabulary based on all the words of the training data. Then for a new sentence, it can be represented as a vector of 0 or 1, based on whether each word in the vocabulary is present in the current sentence. An example is shown as below.

training data: ["want to go swim", "gym", "want to talk to staff", "hello"]
vocab:         ["want", "to", "go", "swim","gym","talk","staff", "hello"]
"go gym":      [ 0,      0,     1,     0,    1,     0,     0,     0]

After converting text into vector, the actual classificaiton model is the nerual network architecture, defined in model.py. The model architecture is fairly straight forward, 2-layer model, as shown below. Input layer size corresponds to the input vector size, which is 8 in the above example. 2 hidden layers each has size 8. Final output layer size is the number of intents for classification, which is 4 in this example. ReLu activation is used after each hidden layer to improve the performances.

Since in v1.0, the data in JSON format is relative small comparing to most of training dataset used for machine learning models. Therefore, the neural network architecture and its size is also made small to match the need of training such data size.

Figure 4: classification model for chatbot intent

v2.0: Torchtext + embedding model

In v2.0, we used a standard intent classificaiton dataset, named “snips”, which is often used in acamedic community to measure the model performances for intent classificaiton task. dataset/generate_snip_json.py is used to convert “snips” dataset into the same JSON data format in v1.0. Since this dataset size (train/valid/test dataset each has 13084/700/700 sentences) is much bigger than the data in v1.0 , we here used a more common tokenizer model and a classification model with more parameters.

The tokenzier is the get_tokenizer function from torchtext library. It then builds the vocabulary using build_vocab_from_iterator function from torchtext library to build the vocabulary. build_vocab_from_iterator function needs to iterate each element of the dataset by yield_tokens function. Therefore ChatDataset class mainly defines how the dataset is iterated. The tokenizer code snippet is shown as below. Please refer to the github repo for all code sources.

from torch.utils.data import Dataset
from torchtext.data.utils import get_tokenizer
from torchtext.vocab import build_vocab_from_iterator

class ChatDataset(Dataset):
    """
    A single training/test example for simple intent classification.
    Args:
        X: pattern sentence
        Y: intent label
    """
    def __init__(self, X, Y):
        self.n_samples = len(X)
        self.x_data = X
        self.y_data = Y

    # support indexing such that dataset[i] can be used to get i-th sample
    def __getitem__(self, index):
        return self.x_data[index], self.y_data[index]

    # we can call len(dataset) to return the size
    def __len__(self):
        return self.n_samples

def chat_vocab_tokenizer(dataset):
    tokenizer = get_tokenizer("basic_english")
    def yield_tokens(data_iter):
            for text,_  in data_iter:
                yield tokenizer(text)    
    vocab = build_vocab_from_iterator(yield_tokens(dataset), specials=["<unk>"])
    vocab.set_default_index(vocab["<unk>"])

    return vocab, tokenizer

After having the tokenzier and vocabulary ready, now each text sentence can be maped to a vector representation, where each vector element is the token index in the vocabulary. For example, if the vocab is as below, then the “go gym” is represented as [2,4], because “go” and “gym” are the 2nd and 4th token in the vocabulary. Below code snipt shows how to convert text input into vector representation by the Dataloader function

vocab:         ["want", "to", "go", "swim","gym","talk","staff", "hello"]
"go gym":      [2, 4]

text_pipeline = lambda x: vocab(tokenizer(x))

def collate_batch(batch):
    label_list, text_list, offsets = [], [], [0]
    for _text, _label in batch:
        label_list.append(_label)
        processed_text = torch.tensor(text_pipeline(_text), dtype=torch.int64)
        text_list.append(processed_text)
        offsets.append(processed_text.size(0))
    label_list = torch.tensor(label_list, dtype=torch.int64)
    offsets = torch.tensor(offsets[:-1]).cumsum(dim=0)
    text_list = torch.cat(text_list)
    return text_list.to(device), label_list.to(device), offsets.to(device)

train_dataset = to_map_style_dataset(train_iter)
valid_dataset = to_map_style_dataset(valid_iter)
test_dataset = to_map_style_dataset(test_iter)
train_dataloader = DataLoader(
    train_dataset, batch_size=BATCH_SIZE, shuffle=True, collate_fn=collate_batch
)
valid_dataloader = DataLoader(
    valid_dataset, batch_size=BATCH_SIZE, shuffle=True, collate_fn=collate_batch
)
test_dataloader = DataLoader(
    test_dataset, batch_size=BATCH_SIZE, shuffle=True, collate_fn=collate_batch
)

After having the input token vector ready, the classification model architecture is an embedding model and another classification layer. The embedding model is simply a matrix, where the number of rows is the size of the vocabulary, the number of column is called embedding size. What embedding does for the given token index, it returns corresponding row vector. Therefore the embedding layer is simply convering a (k x 1) vector where k is the number of tokens into a (k x m) vector where m is the embedding size. The last layer of the model is a fully connected layer with input size m, and output size is the number of intent classificaiton.

from torch import nn

class TextClassificationModel(nn.Module):
    def __init__(self, vocab_size, embed_dim, num_class):
        super(TextClassificationModel, self).__init__()
        self.embedding = nn.EmbeddingBag(vocab_size, embed_dim, sparse=False)
        self.fc = nn.Linear(embed_dim, num_class)
        self.init_weights()

    def init_weights(self):
        initrange = 0.5
        self.embedding.weight.data.uniform_(-initrange, initrange)
        self.fc.weight.data.uniform_(-initrange, initrange)
        self.fc.bias.data.zero_()

    def forward(self, text, offsets):
        embedded = self.embedding(text, offsets)
        return self.fc(embedded)

To Summarize

In this blog, we covered how to build chatbot intent feature from scratch, using machine learning and Pytorch library. However, the best way to learn is always to get hands on. The full code is accessible from Github. If you have any questions, please feel free to leave your comments here.