Under the hood of Alex

Published in

My Ally

5 min readMay 3, 2017

Our scheduling assistant Alex, backed by artificial intelligence, handles a large volume of scheduling requests from customers and their guests. Alex has been trained to comprehend the messages written in natural language and process the request.

On an average, in a thread related to scheduling, about eight to ten messages are exchanged among the people involved before finalizing a mutually convenient time and place. Usually, the information related to the meeting in discussion is spread across different messages in a thread. The inherent complexity of natural language makes it difficult to extract information.

Bag of Words Model and its Limitations:

As an early approach to identify important sentences in a mail, we have used traditional bag of words model using uni-grams, bi-grams and tri-grams as features. The bag of words model cannot leverage phrase level and sentence level syntactic cues. For example, consider sentences ‘Schedule meeting for this week’ and ‘Meeting schedule for this week :’. In the both examples, though the words are same in two sentences, the context in which these words are used is different. But, Bag of words model interprets the two sentences as same as they have same vector representation.

In order to fully process a request for scheduling a meeting, it is essential for the model to take the context of the conversation into account. To understand the context, we need to have a good memory of what has happened in the past. We humans have the ability to selectively memorize information that is important for later use. Sequence based machine learning algorithms like HMM, CRF have the ability to understand the context. With recent advancements in the field, state of the art deep learning algorithms such as Recurrent Neural Networks(RNNs) have proven to outperform other sequential learning models in both academia and the industry.

Enabling Context with Sequence to Sequence Model:

The most simple neural network is a feed-forward type. As it trains, data just flows from the input node all the way to the output node. It only accepts data that is fixed size like an image or a number. Unfortunately, a conversation is not of fixed size. It is a sequence of words and thus the need for a special form of neural network that accepts sequential data. In an RNN, we feed the data back into the input while training it in a recurring loop. RNNs ingest their own outputs from the previous time step in addition to the input at the current time step.The decision of a recurrent net reached at time step t-1 affects the decision it will reach one moment later at time step t. So recurrent networks have two sources of input, the present and the recent past, which combine to determine how they respond to new data, much as we do in life.

A sequence-to-sequence model consists of two recurrent neural networks. One recurrent net is an encoder. Its job is to create an internal representation of the sentence it is given, which can be called a ‘context vector’. It is a statistical value that represents a sentence. The other recurrent net is a decoder. Its job is to, given a context vector, output the associated words.

The type of network used is called a Long Short-Term Memory network(LSTM). This network can remember words from far back in the sequence. Since we are dealing with large sequences, the attention mechanism helps the decoder selectively look at the parts of the sequence that are most relevant. This helps the model to create the context vectors for the existing entities and their relationships. It enables it to associate a certain type of entity with a certain relationship.

Data Preparation and Modelling:

We have a labelled dataset of around 50,000 sentences which are important for scheduling. The deep learning algorithms need larger data sets. We have automatically labelled unseen data based on semi supervised algorithms. We have used a meta-algorithm called expectation maximization, where we maximized the sum of negative log prediction-probabilities of the unlabeled data. The bootstrapped annotated data with the labels that have higher probabilities along with the original labelled data is finally used as training data for prediction. This training data is given as input to the LSTM.

The words are encoded as real-valued vectors in a high-dimensional space, where the similarity between words in terms of meaning translates to closeness in the vector space and therefore sentences are a sequence of real-valued vectors. We have used word embedding for words that have 95% coverage in the data. Words with lesser frequency were replaced with a special token. The maximum length of a processed sentence is capped at 30 words, truncating sentences longer than that and padding sentences shorter than that with null vectors. This data is fed to a bidirectional LSTM and the parameters are auto-tuned using a validation dataset. We found that this approach performs much better when compared to the rule based approach or any other bag of words model.

NLP Pipeline

As a first step of the NLP pipeline every incoming request to Alex is processed for identifying important sentences in the email. The predicted important sentences are further fed to different DNN classification models like type of meeting, phone number mapping, etc. For example, if the type of meeting is predicted as phone call, we then use the input to predict mapping of each phone number to a person. The next step in the pipeline is for the system to check if all the information required to process the request for meeting is available either in the conversation or client preferences, else Alex sends mail to the client asking for the missing information. In case all the necessary information to process the request is available, Alex sends a calendar invite for the meeting.

This way, Alex makes use of the state of the art Deep Learning techniques in multiple stages to process and respond to the scheduling requests with high precision and accuracy.

Did you enjoy the article? Subscribe to our newsletter to get more articles on artificial intelligence, natural language programming, and more.

Under the hood of Alex

Written by Vasista Bommakanti