Retrieval-based Chatbots — Using NLTK & Keras

Sol M. Lozano🦁
10 min readNov 9, 2020
Extract to: https://en.banglatribune.com/tech-and-gadget/news/73913/Bangladesh-not-lagging-behind-in-chatbot

Chatbots are computer applications based on artificial intelligence that allow simulating a conversation with a person giving automatic answers to different doubts and common questions about random topics, and improving and optimizing the cost of staff and technological tools.

For these reasons, their common uses in companies as an alternative to personalized service management.

Also, they have a direct relationship with Big Data because all the information obtained from conversations with customers generates extra value, allowing you to analyze customer behavior more effectively.

Extract to: https://www.whishworks.com/2017/09/08/understanding-the-3-vs-of-big-data-volume-velocity-and-variety/
Extract to: https://www.whishworks.com/2017/09/08/understanding-the-3-vs-of-big-data-volume-velocity-and-variety/

The effectiveness of chatbots complies with the 3 V’s of Big Data:

  • Volume: A large number of requests can be handled by conversational assistants on the spot.
  • Velocity: Can compile a wide variety of information.
  • Variety: Can manage and classify information in less time than a human.

In this project, the focus is the development of a retrieval-based chatbot using deep learning techniques, predefined input, response patterns, and many types of heuristic approaches to select the appropriate response.

The main characteristics of this project are:

  • In my case, I used a conda environment (you can use a pip);
  • Use Python (libraries to data science and interfaces), NLTK, Keras;
  • Trainer with a dataset that contains categories (intents), patterns, and responses;
  • Use of recurrent neural network (LSTM) to classify the category to which the user’s message belongs and can give a random response from the current group.

The project structure contains some python files and more types of files as the following:

Project structure.
  • pickles/words.pkl and pickles/classes.pkl — This is a pickle file with the store words and classes.
  • chat_model.py — A python file with the building model and trains our chatbot.
  • chatbot_gui.py — A python script with the GUI implementation for our chatbot.
  • chatapp.py — A python file that contains the union of all methods from the chat_model.py file.
  • intents.json — The data file that has predefined patterns and responses.
  • chatbot_model.h5 — A trained model that contains information about the model and has weights of the neurons.
  • utils.py — A python file that contains utils methods (to open and save pickles files).

These resources are in my GitHub repository.

Project steps:

1. Import necessary packages

In the out console, we’re going to write the following lines to import all necessary packages (conda env):

conda install tensorflowconda install kerasconda install pickle5conda install nltk

Now, in our chat_model.py script, we import many packages that we’ll use for it.

Note: For the first load it is necessary to uncomment the lines that we can see commented in the following image.

#Import all packages we’ll need.import json
import numpy as np
import random
import nltk
import utils as u
#nltk.download(‘punkt’)
#nltk.download(‘wordnet’)
from keras.models import Sequential
from keras.layers import Dense, Activation, Dropout
from keras.optimizers import SGD

In addition, we’ll create a class called ChatModel for some of all the methods that we’ll use the following template (POO structure):

class ChatModel:def __init__(self):
#Call tokenizing procedure
w, words, documents, classes, self._intents = self.tokenizing(‘intents.json’)
#Call lemmatizing procedure
w, words, documents, classes, lemmatizer = self.lemmatizing(w, words, documents, classes)
#Call training_data procedure
self._train_x, self._train_y = self.training_data(w, words, documents, classes, lemmatizer)
#Call tokenizing procedure
self._model = self.training(self._train_x, self._train_y)
def get_train_x(self):
return self._train_x
def get_train_y(self):
return self._train_y
def get_model(self):
return self._model
def get_intents(self):
return self._intents

To finish this step, we’ll create another class called utils that contains all the methods for manipulation of our pickle files.

import pickle#Creation of pickle files to store all Python objects which we’ll use in the prediction processdef create_pickle(list, pkl_url):
return pickle.dump(list, open(pkl_url,’wb’))
def load_pickle(pkl_url):
return pickle.load(open(pkl_url,’rb’))

2. Load and preprocess data file

Inside our chat_model.py file, that contains two methods called tokenizing() and lemmatizing():

  • Tokenizing method: In the first part of this method, we’ll parse our JSON file called “intents.json” into Python. Then, we’ll do tokenizing in our text data in search of breaking the whole text into small parts like words. Finally, we’ll iterate through the patterns, tokenize the sentences using nltk.word_tokenize() function, and append each word to the list of words. Finally, create a list of classes for our tags.
def tokenizing(self,url):
words=[]
classes = []
documents = []
intents = json.loads(open(url).read())
for intent in intents[‘intents’]:
for pattern in intent[‘patterns’]:
#tokenize each word
w = nltk.word_tokenize(pattern)
words.extend(w)
#add documents in the corpus
documents.append((w, intent[‘tag’]))
# add to our classes list
if intent[‘tag’] not in classes:
classes.append(intent[‘tag’])
return w, words, documents, classes, intents
  • Lemmatizing method: Lemmatizing is the process of converting a word into its lemman form. The second part consists in lemmatize each word and removing duplicate words from the list to then creating a pickle file to store the Python objects which we’ll use while predicting.
def lemmatizing(self, w, words, documents, classes):
ignore_words = [‘?’, ‘!’]
lemmatizer = nltk.stem.WordNetLemmatizer()
# lemmatize, lower each word and remove duplicates
words = [lemmatizer.lemmatize(w.lower()) for w in words if w not in ignore_words]
# sort classes and words
classes = sorted(list(set(classes)))
words = sorted(list(set(words)))
# documents = combination between patterns and intents
print (len(documents), “documents”)
# classes = intents
print (len(classes), “classes”, classes)
# words = all words, vocabulary
print (len(words), “unique lemmatized words”, words)
u.create_pickle(words, ‘pickles\words.pkl’)
u.create_pickle(classes, ‘pickles\classes.pkl’)
return w, words, documents, classes, lemmatizer

3. Create training and testing data

Inside our chat_model.py file and our ChatModel class, we’ll the method to get training sets data called training_data() responsible for providing us the input and output data for our future model. Our input will be the pattern and our output will be the class our input pattern belongs to. The last part of this is to convert all text into numbers.

def training_data(self, w, words, documents, classes, lemmatizer):
# create our training data
training = []
train_x = []
train_y = []
# create an empty array for our output
output_empty = [0] * len(classes)
# training set, bag of words for each sentence
for doc in documents:
# initialize our bag of words
bag = []
# list of tokenized words for the pattern
pattern_words = doc[0]
# lemmatize each word — create base word, in attempt to represent related words
pattern_words = [lemmatizer.lemmatize(word.lower()) for word in pattern_words]
# create our bag of words array with 1, if word match found in current pattern
for w in words:
bag.append(1) if w in pattern_words else bag.append(0)
# output is a ‘0’ for each tag and ‘1’ for current tag (for each pattern)
output_row = list(output_empty)
output_row[classes.index(doc[1])] = 1
training.append([bag, output_row])
# shuffle our features and turn into np.array
random.shuffle(training)
training = np.array(training)
# create train and test lists. X — patterns, Y — intents
train_x = list(training[:,0])
train_y = list(training[:,1])
print(“Training data created”)
return train_x, train_y

4. Build the model

Inside our chat_model.py file and our ChatModel class, we’ll the method to train our model called training() that contains a deep natural network that has 3 layers (128, 64, and the last one has the same size with our number of intents) also, we’ll have a frequency of rate of 0.5 at each step during training time.

Then, with the Keras sequential API, we’ll compile our model with stochastic gradient descent, and with our training, we’ll use 200 epochs and save it as “chatbot_model.h5”

def training(self,train_x, train_y):
#Sequential from Keras
# Create model — 3 layers. First layer 128 neurons, second layer 64 neurons and 3rd output layer contains number of neurons
# equal to number of intents to predict output intent with softmax
model = Sequential()
model.add(Dense(128, input_shape=(len(train_x[0]),), activation=’relu’))
model.add(Dropout(0.5))
model.add(Dense(64, activation=’relu’))
model.add(Dropout(0.5))
model.add(Dense(len(train_y[0]), activation=’softmax’))
# Compile model. Stochastic gradient descent with Nesterov accelerated gradient gives good results for this model
sgd = SGD(lr=0.01, decay=1e-6, momentum=0.9, nesterov=True)
model.compile(loss=’categorical_crossentropy’, optimizer=sgd, metrics=[‘accuracy’])
#fitting and saving the model
hist = model.fit(np.array(train_x), np.array(train_y), epochs=200, batch_size=5, verbose=1)
model.save(‘chatbot_model.h5’, hist)
print(“modeseql created”)
return model

5. Predicted responses

First, we’ll create a class called ChatApp where we can load the trained model in our __init__ method and the other pickle files that we have:

from chat_model import ChatModel as chatModel
import nltk
import pickle
import numpy as np
from keras.models import load_model
import json
import random
import utils as u
class ChatApp:def __init__(self):
self.cM = chatModel()
self._lemmatizer = nltk.stem.WordNetLemmatizer()
self._model = load_model(‘chatbot_model.h5’)
self._intents = self.cM.get_intents()
self._words = u.load_pickle(‘pickles\words.pkl’)
self._classes = u.load_pickle(‘pickles\classes.pkl’)

We’ll need to provide input data in the same way as we did while training. So we’ll use the following methods will use to the text preprocessing and then the prediction of the class:

def clean_up_sentence(self,sentence):
# tokenize the pattern — split words into array
sentence_words = nltk.word_tokenize(sentence)
# stem each word — create short form for word
sentence_words = [self._lemmatizer.lemmatize(word.lower()) for word in sentence_words]
return sentence_words
# return bag of words array: 0 or 1 for each word in the bag that exists in the sentence
def bow(self, sentence, words, show_details=True):
# tokenize the pattern
sentence_words = self.clean_up_sentence(sentence)
# bag of words — matrix of N words, vocabulary matrix
bag = [0]*len(words)
for s in sentence_words:
for i,w in enumerate(words):
if w == s:
# assign 1 if current word is in the vocabulary position
bag[i] = 1
if show_details:
print (“found in bag: %s” % w)
return(np.array(bag))
def predict_class(self, sentence, model):
ERROR_THRESHOLD = 0.25
# filter out predictions below a threshold
p = self.bow(sentence, self._words, show_details=False)
res = self._model.predict(np.array([p]))[0]
results = [[i,r] for i,r in enumerate(res) if r>ERROR_THRESHOLD]
# sort by strength of probability
results.sort(key=lambda x: x[1], reverse=True)
return_list = []
for r in results:
return_list.append({“intent”: self._classes[r[0]], “probability”: str(r[1])})
return return_list

After this, we should get a random response from the list of intents.

def getResponse(self, ints, intents_json):
tag = ints[0][‘intent’]
list_of_intents = intents_json[‘intents’]
for i in list_of_intents:
if(i[‘tag’]== tag):
result = random.choice(i[‘responses’])
break
return result
def chatbot_response(self, text):
ints = self.predict_class(text, self._model)
res = self.getResponse(ints, self._intents)
return res

6. Create GUI (Graphical User Interface)

About the GUI, we’ll use the Tkinter library and will get the input message from the user that uses the app and then will create some methods to get the response from the bot and display it.

#Creating GUI with tkinter
import tkinter
from tkinter import *
from chatapp import ChatApp as cA
def send():
msg = EntryBox.get(“1.0”,’end-1c’).strip()
EntryBox.delete(“0.0”,END)
if msg != ‘’:
ChatLog.config(state=NORMAL)
ChatLog.insert(END, “You: “ + msg + ‘\n\n’)
ChatLog.config(foreground=”#442265", font=(“Verdana”, 12 ))
res = cA().chatbot_response(msg)
ChatLog.insert(END, “Bot: “ + res + ‘\n\n’)
ChatLog.config(state=DISABLED)
ChatLog.yview(END)
base = Tk()
base.title(“ChatBot — SL”)
base.geometry(“400x500”)
base.resizable(width=FALSE, height=FALSE)
#Create Chat window
ChatLog = Text(base, bd=0, bg=”white”, height=”8", width=”50", font=”Arial”,)
ChatLog.config(state=DISABLED)
#Bind scrollbar to Chat window
scrollbar = Scrollbar(base, command=ChatLog.yview,
cursor=”heart”)ChatLog[‘yscrollcommand’] = scrollbar.set
#Create Button to send message
SendButton = Button(base, font=(“Verdana”,12,’bold’), text=”Send”, width=”12", height=5,
bd=0, bg=”#32de97", activebackground=”#3c9d9b”,fg=’#ffffff’,
command= send )
#Create the box to enter message
EntryBox = Text(base, bd=0, bg=”white”,width=”29", height=”5", font=”Arial”)
#EntryBox.bind(“<Return>”, send)
#Place all components on the screen
scrollbar.place(x=376,y=6, height=386)
ChatLog.place(x=6,y=6, height=386, width=370)
EntryBox.place(x=128, y=401, height=90, width=265)
SendButton.place(x=6, y=401, height=90)
base.mainloop()

7. Run the chatbot

To this, we have two files: chatapp.py and chatbot_gui.py.

First, we’ll train our model using the first class that I mentioned in our terminal:

python chatapp.py

If we don’t see any error during training, we have a successful model created. So, the last step to finish is to run the second file:

python chatbot_gui.py

And now, we’ll see our chatbot to do some tests:

We can notice that model is trained every time we touch the send button, so its response time is slower, and for a better response set, we must do some tests to get an excellent model, or else, the exchange between source files with different words.

--

--

Sol M. Lozano🦁

Born in Barranquilla, CO | Living in Medellin, CO | Data Architect⚡️💻