Deep Learning — NLP (Part V-d)

Published in

aihive

6 min readJul 3, 2019

Working with Text

Text is one of the most prevailing forms of sequence data. In order for deep learning models to understand natural language, one must prepare an statistical representation of the language. Deep learning for natural language processing is pattern recognition applied to words, sentences and paragraphs. Raw text must be vectorized in order to be used; this can be done in multiple ways:

Segment text into words and transform each word into a vector.
Segment text into characters and transform each character into a vector
Extract n-grams of words or characters and then transform each n-gram into a vector.

In the previous example for text vectorization I used one-hot encoding of tokens. In my second attempt in building a sentiment analysis engine for movie reviews, I’m going to use word embedding of tokens. Word embedding is a dense word vector. While the vectors obtained through one-hot encoding are binary, sparse, and very high dimensional, word embedding are low dimensional floating point vectors. Simply put, word embedding is a way of representing text where each word in the vocabulary is represented by a real value vector in a high dimensional space.

Implementing word embedding in Keras is very easy.

Deep Learning — NLP (Part V-d)

Working with Text

Written by Dejan Jovanovic