BERT or FastText? Language models for healthcare chatbots.

Helmi Boussetta
Zana AI
Published in
5 min readMay 25, 2021
Photo by National Cancer Institute on Unsplash

Healthcare chatbots can stimulate and engage in a conversation to provide real-time assistance to patients and medical professionals.
The ability to understand natural language and contextualize conversations, while delivering better information and better user experiences, have made chatbot development a trendy area of practice in healthcare.

Zana is one of the front-runners in this domain offering a conversational AI Platform for designing, building and managing GDPR-compliant voice and chatbot applications in healthcare

In this article, we illustrate the work that our team at Zana has performed for recognizing intents in health-related user phrases (queries) that are processed in a healthcare chatbot (Tidda AI).

Thanks to Natural Language Processing (NLP), Chatbots can understand user requests. However, the conversation engine unit in NLP is critical for making the chatbot more contextual and providing users with customized conversation experiences. The main feature of this chatbot conversation agent is intent classification. As simple as it may sound, it is actually quite a complex process. Text input is recognized by a software feature known as a “classifier,” which associates the user’s input with a particular “intent,” resulting in a concise description of the words to be understandable by the machine.

A classifier is a tool for categorizing data — in this case, a sentence — into multiple categories. Chatbots can classify each part of a sentence into broken down categories to understand the intention behind the information it has received, similar to how humans classify items into sets, such as a violin is an instrument, a shirt is a type of clothing, and happy is an emotion.

In today’s world, being able to recognize intent (IR) from text is extremely useful. Typically, we are given a short text (a sentence or two) and must categorize it into one (or more) classes.

1. Problem statement

For this project, we chose to use the BERT pre-trained model for word embedding and comparing it to FastText in the context of intent detection.

The goal was to fine tune and then evaluate which language model perform better in the specific problem of Intent Classification for the medical domain.

3 different approaches were implemented:

  • BERT fine-tuned to CNN
  • BERT fine-tuned to RNN
  • FastText and RNN

Which one performed better?

2. Our Methodology

Dataset:

The dataset used for this project contains various medical user queries. It contains 109 entries categorized into five intents:

  • greeting (e.g. hi there)
  • introduction (e.g. What kind of things can cause tuberculosis?)
  • symptoms (e.g. what are the symptoms of cold)
  • causes (e.g. What kind of things can cause tuberculosis?)
  • treatment (e.g. Can I treat breast cancer?)

Pre-processing:

Before building the machine learning algorithms and neural networks, we need numeric representations of text that can be fed into our AI models. This need is met by sentence vectors. They are based on the principle of vector space models, which provide a way to convert sentences typed by a user into a mathematical vectors. These vectors can then be used for the intent classification.

We performed the following steps:

  • Tokenize the text
  • Convert the sequence of tokens into numbers
  • Pad the sequences so each one has the same length

1. BERT Word Embeddings:

For the word embedding we are using the base BERT pre-trained model:

  • Number of transformer blocks (L): 12
  • Hidden layer size (H): 768
  • Attention heads(A): 12

By feeding various vector combinations as input features to a BiLSTM used on a named entity recognition task and observing the resulting F1 ratings, the BERT authors checked word-embedding strategies.

Image by Jay Alammar

In our case, we are using the output of the last hidden state as our word vectorizer.

For more details about the BERT Embeddings please refer to this link: http://jalammar.github.io/illustrated-bert/

2. FastText Word Embeddings:

FastText (based on Word2Vec) is word-fragment based and can usually handle unseen words, although it still generates one vector per word.

FastText, for example, has models in dozens of languages. Bert has a general multilingual model and a Chinese pretrained model published.

For more details about the FastText Embeddings please refer to this link: https://fasttext.cc/

Models used: Background information theory:

· CNN(Convolutional Neural Networks):

Convolutional Neural Networks (CNN) is one of the variants of neural networks used heavily in the field of Computer Vision. It derives its name from the type of hidden layers it consists of. The hidden layers of a CNN typically consist of convolutional layers, pooling layers, fully connected layers, and normalization layers.

Image by: Ensiwiki

· RNN(Recurrent Neural Networks):

Recurrent Neural Networks or RNN as they are called in short, are a very important variant of neural networks heavily used in Natural Language Processing. In a general neural network, an input is processed through a number of layers and an output is produced, with an assumption that two successive inputs are independent of each other.

RNNs are called recurrent because they perform the same task for every element of a sequence, with the output being depended on the previous computations. Another way to think about RNNs is that they have a “memory” which captures information about what has been calculated so far. In theory, RNNs can make use of information in arbitrarily long sequences, but in practice, they are limited to looking back only a few steps.

Here is the architecture of an RNN:

Image by: Analytics Vidhya

Models architecture:

  • BERT fine-tuned to CNN:
  • BERT fine-tuned to RNN:
  • FastText + RNN:

Our Results

Cross Validation

We used the cross validation approach with K=10.
The results were:

Conclusion:

Fasttext and BERT are both good when it comes to word embedding tasks. However, for our use case which is the intent classification of medical user queries, it is clear based on the results that the FastText was the best technique to use.

--

--

Helmi Boussetta
Zana AI
Writer for

👋 My name is Helmi Boussetta. I am a Machine Learning engineer at Zana Technologies GmbH, a company specialized in introducing AI into healthcare applications.