Gautam Karmakar
Jul 30, 2017 · 3 min read

Word embedding and Convolutional Neural Network for Sentence Classification

Sentence classification using Semantic Meaning:

Deep learning model architecture can be used to classify sentence based on semantic meaning of sentences. Deep learning has been reaching heights in computer vision and speech recognition tasks for a while now. But lately there are major success of deep learning techniques in natural language processing tasks like sentiment analysis, machine translation, text summarization etc. Much of deep learning work on natural language text is based on vector representation of words through language model. Deep learning pioneer researcher Yashua Bengio on 2003, Mikolov et al. in 2013 and Colbert et al. showed use of vector representation of word and use of classification on top of learned features from vector representation of words.

Word embedding maps word to 1-V representation (V is vocabulary size) and then use hidden layer (Neural network) to learn semantic representation from word vectors and then use classifier at the end of the architecture. Hidden layers of neural network essentially act as a feature extractor representing word vectors into lower dimensional vector space and encode semantic meaning of the words.

Convolutional Neural Network (Yann Lecun, 1998) has been very successful in computer vision tasks and also showed excellent results in natural language processing tasks like semantic parsing (Yih et al 2014), Search Query Retrieval (Shen et al 2014), Sentence Modeling and other traditional NLP tasks.

In this sentence classification tasks we will be using CNN model to classify sentences based on their meaning. Each sentences could be rules associated with fund disclosures. At first these sentences and words associated with it will be converted to its vector representation using pre trained GloVe or word2vec model. These models are trained with billions of words from google news (word2vec) or millions of wikipedia pages (GloVe).

We will keep these vectors as static vectors and show it as a universal feature extractor that can produce very good result. But as a next stage we can use fine tuned word vectors for task specific training. This will show that pre trained vectors can be utilized for various tasks different from original task where these vectors are trained for.

In our architecture we will use simple convolution layer on top of learned vector embeddings of words. First words/sentences will be passed to the embedding layer to generate vector. We will add convolutional layer on top of it with filters of multiple sizes. This will extract features and encode semantic meaning of words and sentences. After that we will add softmax layer to classify sentences.

Here is some snapshot of codes that will be used to build out model architecture:

Embedding Layer:

With tf.name_scope(“Embedding”):

W = tf.Variable(tf,random.uniform([vocab_size, embedding_size], -1, 1), name=”Weight”)

Self.embedded_chars = tf.nn.embedding_lookup((W, input_X)

Convolutional & MaxPooling Layer:

model_output=[]

For i, filter_size in enumerate(filter_sizes):

With tf.name_scope(“conv-maxpool-%s’, %filter_size):

Filter_shape = [filter_size, embedding_size, 1, num_filters])

W = tf.Variable(tf.truncated_normal(filter_shape, stddev = 0.1), name = “weight”)

b = tf.Variable(tf.constant(0.1, shape =[num_filters]), name = “bias”)

Conv = tf.nn.conv2d(self.embedded_chars,

W,

Strides = [1,1,1,1],

Padding = ‘valid’,

Name = ‘conv’)

h = tf.nn.relu(tf.nn.bias_add(conv, b), name = ‘activation’)

pool = tf.nn.max_pool(h,

ksize = [1, sequence_length — filter_size + 1, 1,1],

strieds [1,1,1,1],

padding = ‘valid’,

Name = “pool”)

model_output.append(pool)

Return model_output

#combine all pooled features:

num_filters_total = num_filters * len(filter_sizes)

self.h_pool = tf.concat(3, model_output)

We will use dropout as regularizer for our model for fair amount of generalization. We will see batchnormalization adds any value of not for NLP tasks. We will try both ADAM or softmax_cross_entropy_with_logits loss function to see the accuracy of prediction comparison.

We will visualize result with tensorboard if time permits we will also use t-sne presentation.

Enjoy and Happy [deep] Learning!!

References: https://arxiv.org/abs/1408.5882 http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/

Gautam Karmakar

Written by

Hungry, learner, fascinated by ML & AI