Tensorflow Demo

Introduction

In this blog, we will discuss a Tensorflow Demo and various cloud offerings from companies like IBM, Google and others because platform wars is picking up the momentum. There is no doubt that it’s very important to pick the right vendor and the right technology to be successful.

Kudos to TehTarget on publishing this.

Let’s talk about Tensorflow:

Tensorflow is pyhthon library for training artificial neural network . Tensorflow is developed by google and become very successful library to implement several neural network model including Feed-forward neural network , recurrent neural network and convolutional neural network. It can also be used to implement several Deep learning neural network models like Boltzman machine and Deep autoencoder network.

In this Blog , we will show an example of developing convolutional neural network for sentence classification. We follow the algorithm presented in Yoon Kim paper “Convolutional Neural Networks for Sentence Classification”.

We will work in Yelp dataset. Yelp Data Challenge is an online challenge for machine learning and data science addicted developers. You can download the dataset free from the internet and apply set of machine learning , data mining, deep learning, social graph mining , Natural language processing ,etc. You can win 5,000 Dollar if you come up with new technique that help Yelp to use the data in more smarter way.

For the purpose of this Demo , we select Yelp rate prediction problem to be our main focus, we will solve using deep convolutional neural network. In the next section , we will explain the problem in details.

Problem Definition

The yelp dataset contains five different Json files , business, check-in , review , tip and user.The user review file is an interesting json files that contains the customer reviews for different business entities. The json file contains seven different fields, including encrypted user_id , review_id , star, date, text ,type , and encrypted business_id. What is interesting to us is the “text” and “star” fields.

The “text”field contains the user review about specific business entity in plain text, while the “star” contains the user rate. The “star” can ranges from 1–5. This range of stars measures the user satisfaction about the business. Level:1 means that the user doesn’t like the service provided , while level:5 means that the user extremely like the service.

Our problem is: given the user review in plain text , can we predict the user rate in range from 1–5.

Such problem is important , as many users has types the review but they don’t fill the star field. So we need an intelligent way to predict the user rate based on review text.

We can make the problem simpler , Given the user review , can we estimate the user satisfaction? This problem is simpler as the result of such problem is binary answer , wether True ,or false. True means that the user likes the service while False means that the user doesn’t like the service.

So , we decide to model the problem as the following: satisfied users will have rate 4–5 , while other users will have rates 1,2 and 3.

In the next section , we will discuss the pre-processing steps that we did before running the Deep convolutional neural network.

Pre-processing

We extracted the first 2000 reviews from the dataset to train them in the convolutional neural network. We label them using binary label , true or False. Reviews with rate greater than 3 , will have true label , while reviews with rate less than or equals 3 will have false label. Here is an example of our dataset.

Review

Label

I like This place a lot

TRUE

Don’t waste your time in that restaurant

FALSE

Awesome historic building high on top of the hill in Carnegie.

TRUE

Beautiful restoration of the library and music hall

TRUE

Approach

We choose to use Deep convolutional neural network as learning model to learn the user review- rate relationship. Deep convolutional neural network is binary classifier that outputs either True or False.

Since Neural network is computational model , then we need to find a way to transfer text into numbers. Word2vec is the technique that is used in Natural Language processing to convert the text into numerical vector.

By using word2vec , we can transfer every word into a vector of real numbers, where we can apply algebra into words. For example,we apply the following equations : King-Man+Woman=Queen. We get this result as the distance from man to king equals to the distance from Woman to Queen.Word2vec is very interesting and familiar technique in Natural language processing.

Our training examples contains set of statements , as each statement contains number of words , then we can generate a matrix for each single statement. Figure 1 shows the model architecture, and gives an example of a statement with the corresponding matrix.

Figure 1:Model architecture

https://arxiv.org/pdf/1408.5882v2.pdf

In the figure , you can see an statement : “wait for the video and do n’t rent it”, word2vec generate a vector for every single word , with concatenating them up , each statement will have a matrix with size where n is number of statement words and k is the word vector length.

Once we build these matrices , the convolutional neural network can start its learning process. In the next section , we will discuss how convolutional neural network can learns the sentence classification.

Convolutional Operation

Convolutional neural network has two main layers , convolutional layer , then the pooling layer. We will start our discussion to discuss the convolutional layer , then we will discuss the pooling layer.

Convolutional layer:

Convolutional operations involve training filters which applied to set of words of h words to produce a new feature vector. Let’s have an example of filter of size 2.

A filter w of size 2 will have matrix of real values with size . such filter will slide down through statement matrix , where in each step , will make dot product between the filter values and two successive word vectors from the statement matrix. The dot product will produce scaler value. With sliding down this filter into the statement matrix of n records , will generate a vector of length n-2+1.In more general term , a filter with length k , will produce a feature vector in length n-k+1.

When training several convolutional filters , we can generate multiple feature vectors with different lengths , number of feature vectors that we get is equal to number of convolutional filters that we train.In our case , we train 128 filters with size 3 , 128 filters of size 4 and 128 filters with size 5, so in total , we get 384 filters.

Once we have done with the convolutional layers , we get 384 feature vectors , in the next section , we will discuss the pooling layer.