How to build a custom Sentiment Analysis model with Google AutoML Natural Language

Damon Leong
Google Cloud Platform by Cloud Ace
5 min readMay 13, 2020

--

Photo by Casey Horner on Unsplash

The main reason we need to build a custom sentiment analysis model is that the computer simple do not understand the sentence your express is positive or negative. For example, if you have an open Facebook group and more than 10000 comments per day. You find your self very difficult to check the feedback one by one to know the feedback is positive or negative for each comment. In order to automate the process, we need a tool to help us to solve this issue.

The most efficient way is to build the custom Sentiment Analysis model on Google Cloud AutoML Natural Language

First, you have to log in on the Google Cloud Platform Console https://console.cloud.google.com/ select the natural language in the left panel as the image below:

and then you should see the dashboard. Select the AutoML Sentiment Analysis and click on Get Started. We need to create a new dataset, click on new dataset button, then you will see the screen as image below, you should rename your dataset, the location is where you want to store your dataset, we leave it to default global. and select the sentiment analysis block, notice that the bottom of the screen is sentiment scale, this is the range of the sentiment scores to expect in the dataset, we set the 0 is negative, 2 is positive and 1 for neutral. Remember once you have selected the sentiment scale range it cannot change after the dataset created.

After we finish creating the dataset, we can start import the file from the computer or cloud storage as the image below

To prepare the dataset we need to follow the format to prevent import failed. Below is the demo dataset modify after downloading from kaggle.com.

This demo dataset is the sentiment from financial news headlines from the perspective of retail investor

The left column is training set, train, validation and test. AutoML Natural Language uses the training set to build the model. The model tries multiple algorithm and parameter while searching for patterns in the training data. As the model identifies patterns, it uses the validation set to the test algorithms and patterns. By default AutoML Natural Language splits training data randomly into 3 sets: 80% for training, 10% for validation and 10% for testing if you left column to empty AutoML Natural Language will auto divides the document into 3 sets.

Remember if the left column label is different from these 3 labels, the import will fail. Center column is the text from sample dataset, each row cannot have duplicate text or the import will fail. The right column is the sentiment score by a human being. 0 is negative, 1 is neutral and 2 is positive. Sentiment score must be an integer value, cannot contain a floating-point number, and must match to your dataset sentiment score range or it will fail to import. Import takes up to a few minutes to process. You will get the email once import complete.

Once the data success import clicks on the item tab and we can see the listing as below. You can change the sentiment score by click on the item.

Then we can begin to train our model, select train tab and click start training. It takes up to several hours to complete. It depends on the size of the dataset and natural of the training item and the complexity of the model. You will get the email once the process complete. The image below is a screenshot before training complete

After training complete you can select the evaluate tab to see the performance result. As the image below metric precision and recall precision is to tell how many items are really were 0? Recall on the other hand tell us all entire item in the dataset, what per cent of the those that should have been labelled 0 did the demo actual end up labelling 0? Another important feature is confusion matrix, it tells us how the real label of the dataset compare to the label that our model predicted. For example, 53% of the time the model accurately predict that the text with the label 0 should have had label 0, but 15% of the time it also thought that text with the label 1 those that are just neutral should be the label 0

If we click on the Sentiment score 0 as the image below we can see the example of what of the model correctly and incorrectly label

Finally, we select the test & use tab to start using our model. We just need to type some positive and negative comment and it will show us the sentiment score

Hope you find this article is useful, for more information you can visit https://cloud.google.com/natural-language/

--

--