Deep Models for Text and Sequences

“Similar words occur in similar context” is what we presume. Map words to small vectors called Embeddings, which are going to be close together when the words share meaning and further apart when they don’t.

Word2Vec :

“Machine Learning is awesome”
For each word in this text we are going to map it to an embedding, initially a random one. Using this embedding we try to predict the context of the word.
Pick a random word around the window of words.
The model we are going to use to predict the nearby words is “logistic classifier”.


tSNE is used to reduce the dimensionality space to 2D. Suppose we use PCA, we lose to much information. In order to preserve the neighbourhood structure we use tSNE over PCA.

Comparing Embeddings:

we choose cosine distance over L2 distance.

The above figure shows that of a softmax classifier.

Sampling the negative targets is often called Sampled Softmax.
This makes things faster at no costing performance.

What have we achieved so far?

We have similar words close to each other!

Recurrent Neural Network

A recurrent neural network (RNN) is a class of artificial neural network where connections between units form a directed cycle. This creates an internal state of the network which allows it to exhibit dynamic temporal behavior. Unlike feedforward neural networks, RNNs can use their internal memory to process arbitrary sequences of inputs. This makes them applicable to tasks such as unsegmented connected handwriting recognition or speech recognition.

LSTM (Long Short Term Memory)

These gates help the model keep its memory longer when it needs to.

Like most RNNs, an LSTM network is universal in the sense that given enough network units it can compute anything a conventional computer can compute, provided it has the proper weight matrix, which may be viewed as its program. Unlike traditional RNNs, an LSTM network is well-suited to learn from experience to classify, process andpredict time series when there are very long time lags of unknown size between important events. This is one of the main reasons why LSTM outperforms alternative RNNs and hidden Markov models and other sequence learning methods in numerous applications.

If the value that we multiply with the input is continuous, it also means that it is differentiable and we can backpropogate!


We can use L2 and Dropout (as long as you use them on the input or the output)

Alright that’s it for now! Thank you for spending your time. Cheers!

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.