Building a Rasa Chatbot in Bengali using Supervised Word vectors from scratch
Coming from India, one of the major social networking habit we have is writing/adapting our language (for me in Bengali) in english alphabets to express ourselves over the internet. Indeed over time, this has changed a lot as more and more social media companies have started providing support for our languages in India, however given our entire workforce works mostly in English and having a latin keyboard don’t really help in writing my mother tongue, We usually stick to writing Bengali using english alphabets.
Please note: I am aware that many amongst us prefer to write in their own script as major social media sites have started supporting it but personally chatting with my friends and family who usually write their language (be it Hindi, Bengali, Oriya or others) in english alphabets.
You can notice that over the entire internet across India specially public forums
Tumi kemn acho? — How are you?
Ki korchis? — What are you doing?
Barite sobai kemn ache? — How is everyone at home?
There are no chatbots in the market that i know of can figure what this actually means, so i started out on a fun little experiment using the blog post from Rasa on
We’ve released a new pipeline which is totally different from the standard Rasa NLU approach. It uses very little…medium.com
How did i start with?
To begin i added a lot of training examples to feed Rasa NLU ( I have shared github link below)
If you are familiar with Rasa NLU, I am using version 0.12 that contains the tensorflow_embedding pipeline. Here is my config file. Since i am only interested in text classification, i decided to skip the NER ( Named Entity Recognition)
However, i had to add some dependencies to do so. Feel free to consult their documentation here
This is the documentation for version 0.12.2 of Rasa NLU. Make sure you select the appropriate version of the…nlu.rasa.com
# Minimum Install Requirements
So this was the intent classification part. Now let’s move on to Rasa core
Rasa core as amazing it is, relies on Rasa NLU to catch the right intent and train upon examples of different chat scenarios which trains a model on Tensorflow to predict the next action.
Two files are needed here to start with
Let’s start with the Domain File ( The Universe of the bot)
intents: ## My 5 intents are about simple chats ( How are you/what are you doing/who are you/tell me a story/what is happening in kolkata)
- text: "Aami bhalo achi"
- text: "Amar kache kono golpo nei"
- text: "Aami kono kaaj korte parina apatoto"
- text: "Aami janina"
- text: "Kolkatae aajke khub gorom"
Next up are the stories ( this is basically examples of how a user will interact with the bot.
## path khobor
## path kaaj
## path key
## path golpo
## path golpo
## path khobor 1
For simplicity, I use docker-compose to spin up two containers
Rasa NLU Server running on port 5000
Rasa Core server running on port 5005.
I bash into each to train the models
Here i also added the code to connect it over Facebook but this involves running these images on Heroku or some other cloud instance. Your choice.
My goal here is to see if it is able to detect what i am saying in Bengali using english alphabets and correspondingly reply to my questions.
Input — NLU Parsing
GET /parse?project=bot1&q="tumi kemn acho?" HTTP/1.1
Output — NLU
"text": "\"tumi kemn acho?\""
Given the results you can see the NLU is understanding really well, what
Tumi kemn acho? — How are you? — Intent Kemn_acho( User is asking about how are you?)
Now let’s see if the chat is working or not? I was trying to add a small video as gif but couldn’t succeed here, nevertheless here is a screenshot of what i did
As you can see from above, how well it understands what i am trying to say. This can open door to many e-commerce or retail industry who wants a specific line of communication and also how a custom internal channels like HR/Legal can take help of automated communication to ease the life of their clients.
Here is link to my Github, feel free to clone and make this project your own and try out in other languages that don’t necessarily follow the same latin script but we use it anyway!!
Way to go Rasa, disrupting the bot industry!!