Understanding Rasa Tensorflow intent classifier

Tatiana Parshina
3 min readApr 29, 2019

--

This post is about how Rasa AI chatbot uses StarSpace idea from Facebook AI Research team for intent classification using supervised embeddings.

StarSpace overview

StarSpace is a general-purpose neural model for efficient learning of entity embeddings for solving a wide variety of problems. One example of the problem is the intent classification for AI chatbot.

StarSpace embeds entities of different types into a vectorial embedding space hence the “star” (“*”, meaning all types) and “space” in the name, and in that common space compares them against each other.

For AI chatbot, the embedding intent classifier embeds user inputs and intent labels into the same space. Embeddings are vectorial representations of words or documents. User inputs can be described by a bag of words.

During the training of a model, user inputs should be compared and the following loss function should be minimized:

  • a are documents (bags-of-words)
  • b are labels (intents) from the training set
  • Negative entities b are sampled from the set of possible labels
  • (a,b) is positive entity pairs, comes directly from a training set (nlu_data) of labeled data specifying (a, b) pairs.
  • sim(·, ·) is the similarity function. By default, Rasa uses cosine similarity. Another possible value is “inner”
  • L is the loss function that compares the positive pair (a, b) with the negative pairs.

Create a neural network

First of all, a neural network (NN) should be created. Input is a vector representation for user requests. NN has layers:

  1. The hidden densely-connected layer which produces output with dimension 256
  2. Dropout layer with dropout rate = 0.2 which would drop out 20% of input units
  3. The hidden densely-connected layer which produces output with dimension 128
  4. Dropout layer with dropout rate = 0.2 which would drop out 20% of input units
  5. The output layer with dimension 20

The hidden densely-connected layer uses:

  • activation function is Rectified Linear Unit (ReLU) which computes rectified linear: max(features, 0)
  • kernel_regularizer: Regularizer function (L2 regularization) for the weight matrix with the scale of L2 regularization C2=0.002.

Cosine similarity

Cosine similarity is a measure of similarity between two non-zero vectors. It measures the cosine of the angle between them:

Cosine similarity is defined between embedded words and embedded intent labels:

Loss function

To optimize a model, you need to define the loss.

Default values:

  • mu_pos: 0.8 (should be 0.0 < … < 1.0 for ‘cosine’) is how similar the algorithm should try to make embedding vectors for correct intent labels
  • mu_neg: -0.4 (should be -1.0 < … < 1.0 for ‘cosine’) is maximum negative similarity for incorrect intent labels

Train

Optimizers incrementally change each variable in order to minimize the loss.

AdamOptimizer is an optimizer that implements the Adam algorithm

This code builds all the graph components necessary for the optimization using AdamOptimizer, and run Tensorflow training operation:

Useful resources:

--

--