Deep Learning — NLP (Part V-d)

Dejan Jovanovic
aihive
Published in
6 min readJul 3, 2019

--

Working with Text

Text is one of the most prevailing forms of sequence data. In order for deep learning models to understand natural language, one must prepare an statistical representation of the language. Deep learning for natural language processing is pattern recognition applied to words, sentences and paragraphs. Raw text must be vectorized in order to be used; this can be done in multiple ways:

  1. Segment text into words and transform each word into a vector.
  2. Segment text into characters and transform each character into a vector
  3. Extract n-grams of words or characters and then transform each n-gram into a vector.

In the previous example for text vectorization I used one-hot encoding of tokens. In my second attempt in building a sentiment analysis engine for movie reviews, I’m going to use word embedding of tokens. Word embedding is a dense word vector. While the vectors obtained through one-hot encoding are binary, sparse, and very high dimensional, word embedding are low dimensional floating point vectors. Simply put, word embedding is a way of representing text where each word in the vocabulary is represented by a real value vector in a high dimensional space.

Implementing word embedding in Keras is very easy.

--

--

Dejan Jovanovic
aihive

Seasoned executive, business and technology leader, entrepreneur, blockchain and smart contract expert