Building a Rasa Chatbot in Bengali using Supervised Word vectors from scratch

Coming from India, one of the major social networking habit we have is writing/adapting our language (for me in Bengali) in english alphabets to express ourselves over the internet. Indeed over time, this has changed a lot as more and more social media companies have started providing support for our languages in India, however given our entire workforce works mostly in English and having a latin keyboard don’t really help in writing my mother tongue, We usually stick to writing Bengali using english alphabets.

Please note: I am aware that many amongst us prefer to write in their own script as major social media sites have started supporting it but personally chatting with my friends and family who usually write their language (be it Hindi, Bengali, Oriya or others) in english alphabets.

You can notice that over the entire internet across India specially public forums

For example:

Tumi kemn acho? — How are you?
Ki korchis? — What are you doing?
Barite sobai kemn ache? — How is everyone at home?

There are no chatbots in the market that i know of can figure what this actually means, so i started out on a fun little experiment using the blog post from Rasa on

How did i start with?

To begin i added a lot of training examples to feed Rasa NLU ( I have shared github link below)

Training examples for NLU
language: "en"
pipeline: "tensorflow_embedding"

If you are familiar with Rasa NLU, I am using version 0.12 that contains the tensorflow_embedding pipeline. Here is my config file. Since i am only interested in text classification, i decided to skip the NER ( Named Entity Recognition)

However, i had to add some dependencies to do so. Feel free to consult their documentation here

# Minimum Install Requirements
-r requirements_bare.txt
rasa_nlu==0.12.0
spacy==2.0.0
scikit-learn==0.18.1
scipy==0.19.0
sklearn-crfsuite==0.3.5
duckling==1.7.3
tensorflow==1.6.0

So this was the intent classification part. Now let’s move on to Rasa core

Rasa core as amazing it is, relies on Rasa NLU to catch the right intent and train upon examples of different chat scenarios which trains a model on Tensorflow to predict the next action.

Two files are needed here to start with

Let’s start with the Domain File ( The Universe of the bot)

intents: ## My 5 intents are about simple chats ( How are you/what are you doing/who are you/tell me a story/what is happening in kolkata) 
- kemn_acho
- ki_korcho
- tumi_key
- golpo_sonao
- kolkata_khobor
templates:
utter_bhalo:
- text: "Aami bhalo achi"
utter_golpo:
- text: "Amar kache kono golpo nei"
utter_kaaj:
- text: "Aami kono kaaj korte parina apatoto"
utter_key:
- text: "Aami janina"

utter_kolkata:
- text: "Kolkatae aajke khub gorom"

actions:
- utter_bhalo
- utter_golpo
- utter_kaaj
- utter_key
- utter_kolkata

Next up are the stories ( this is basically examples of how a user will interact with the bot.

## path khobor
* kemn_acho
- utter_bhalo
## path kaaj
* ki_korcho
- utter_kaaj
## path key
* tumi_key
- utter_key
## path golpo
* golpo_sonao
- utter_golpo
## path golpo
* kolkata_khobor
- utter_kolkata
## path khobor 1
* kemn_acho
- utter_bhalo
* ki_korcho
- utter_kaaj
* golpo_sonao
- utter_golpo

Training time

For simplicity, I use docker-compose to spin up two containers

Rasa NLU Server running on port 5000

Rasa Core server running on port 5005.

I bash into each to train the models

Here i also added the code to connect it over Facebook but this involves running these images on Heroku or some other cloud instance. Your choice.

Results

My goal here is to see if it is able to detect what i am saying in Bengali using english alphabets and correspondingly reply to my questions.

Input — NLU Parsing

GET /parse?project=bot1&q="tumi kemn acho?" HTTP/1.1
Host: localhost:5000
Cache-Control: no-cache
Postman-Token: d4784078-2c4a-dbb3-592c-2d60e56e1e68

Output — NLU

{
"entities": [],
"intent_ranking": [
{
"confidence": 0.9517878293991089,
"name": "kemn_acho"
},
{
"confidence": 0.18100501596927643,
"name": "kolkata_khobor"
},
{
"confidence": -0.10150938481092453,
"name": "tumi_key"
},
{
"confidence": -0.13094939291477203,
"name": "ki_korcho"
},
{
"confidence": -0.5834070444107056,
"name": "golpo_sonao"
}
],
"intent": {
"confidence": 0.9517878293991089,
"name": "kemn_acho"
},
"text": "\"tumi kemn acho?\""
}

Given the results you can see the NLU is understanding really well, what

Tumi kemn acho? — How are you? — Intent Kemn_acho( User is asking about how are you?)

Now let’s see if the chat is working or not? I was trying to add a small video as gif but couldn’t succeed here, nevertheless here is a screenshot of what i did

Bot is replying in console

As you can see from above, how well it understands what i am trying to say. This can open door to many e-commerce or retail industry who wants a specific line of communication and also how a custom internal channels like HR/Legal can take help of automated communication to ease the life of their clients.

Here is link to my Github, feel free to clone and make this project your own and try out in other languages that don’t necessarily follow the same latin script but we use it anyway!!

https://github.com/souvikg10/rasa_bot_example.git

Way to go Rasa, disrupting the bot industry!!