Building a Rasa Chatbot in Bengali using Supervised Word vectors from scratch

Coming from India, one of the major social networking habit we have is writing/adapting our language (for me in Bengali) in english alphabets to express ourselves over the internet. Indeed over time, this has changed a lot as more and more social media companies have started providing support for our languages in India, however given our entire workforce works mostly in English and having a latin keyboard don’t really help in writing my mother tongue, We usually stick to writing Bengali using english alphabets.

Please note: I am aware that many amongst us prefer to write in their own script as major social media sites have started supporting it but personally chatting with my friends and family who usually write their language (be it Hindi, Bengali, Oriya or others) in english alphabets.

You can notice that over the entire internet across India specially public forums

For example:

Tumi kemn acho? — How are you?
Ki korchis? — What are you doing?
Barite sobai kemn ache? — How is everyone at home?

There are no chatbots in the market that i know of can figure what this actually means, so i started out on a fun little experiment using the blog post from Rasa on

How did i start with?

To begin i added a lot of training examples to feed Rasa NLU ( I have shared github link below)

Image for post
Image for post
Training examples for NLU

If you are familiar with Rasa NLU, I am using version 0.12 that contains the tensorflow_embedding pipeline. Here is my config file. Since i am only interested in text classification, i decided to skip the NER ( Named Entity Recognition)

However, i had to add some dependencies to do so. Feel free to consult their documentation here

So this was the intent classification part. Now let’s move on to Rasa core

Rasa core as amazing it is, relies on Rasa NLU to catch the right intent and train upon examples of different chat scenarios which trains a model on Tensorflow to predict the next action.

Two files are needed here to start with

Let’s start with the Domain File ( The Universe of the bot)

Next up are the stories ( this is basically examples of how a user will interact with the bot.

Training time

For simplicity, I use docker-compose to spin up two containers

Rasa NLU Server running on port 5000

Rasa Core server running on port 5005.

I bash into each to train the models

Here i also added the code to connect it over Facebook but this involves running these images on Heroku or some other cloud instance. Your choice.

Results

My goal here is to see if it is able to detect what i am saying in Bengali using english alphabets and correspondingly reply to my questions.

Input — NLU Parsing

Output — NLU

Given the results you can see the NLU is understanding really well, what

Tumi kemn acho? — How are you? — Intent Kemn_acho( User is asking about how are you?)

Now let’s see if the chat is working or not? I was trying to add a small video as gif but couldn’t succeed here, nevertheless here is a screenshot of what i did

Image for post
Image for post
Bot is replying in console

As you can see from above, how well it understands what i am trying to say. This can open door to many e-commerce or retail industry who wants a specific line of communication and also how a custom internal channels like HR/Legal can take help of automated communication to ease the life of their clients.

Here is link to my Github, feel free to clone and make this project your own and try out in other languages that don’t necessarily follow the same latin script but we use it anyway!!

Way to go Rasa, disrupting the bot industry!!

Written by

AI enthusiast||Conversational AI||ML Engineer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store