Using AWS Comprehend and AWS Elasticsearch for NLP

Published in

Aubergine Solutions

3 min readSep 1, 2020

Introduction

In this article, we’ll try to implement a program that will be used as a quote suggestion system. I’ll be using quotes from https://programming-quotes-api.herokuapp.com/ API. The software should return similar quotes if the user liked a quote and return should return different quotes if the user disliked a quote.

The Approach

We’ll need some sort of method to check the relativity of every quote with each other. The sentiment of sentences is used to find the most related quote.

To complete the requirements, we’ll be using two AWS services.

AWS Comprehend to get the sentiment of text
AWS Elasticsearch for storing data

Boto library from python will be used for connecting with AWS and using services.

Sentiment analysis

Sentiment analysis is the classification of text into different emotions using machine learning. It allows companies to find the sentiment of users toward certain products from ratings and social media comments. For example,

“The best minds of my generation are thinking about how to make people click ads.” — Neutral sentiment
“A program that produces incorrect results twice as fast is infinitely slower.” — Negative sentiment
“Walking on water and developing software from a specification are easy if both are frozen.” — Positive sentiment

There are multiple NLP methods for finding the sentiment of the text. But instead of doing it from scratch, we are using AWS Comprehend’s pre-trained models.

AWS Comprehend

AWS Comprehend is a text analysis service for processing language for getting different types of insights. It has some pre-trained models for direct usage. Here, we’ll use such a model for sentiment analysis.

The above two lines make use of the boto3 library from python and connect with AWS Comprehend service. This service provides detect_sentiment function along with it to call sentiment analysis API. Detailed information about this API is here.

Response received from the above snippet looks like the following. There are multiple percentages assigned to every emotion in classification. Most scored emotion is considered as a sentiment of the text.

AWS Elasticsearch

It is used for storing information on quotes with its sentiment for comparing each of them. Elasticsearch is a type of NoSQL database for storing documents. Before storing the document, the quote is merged with sentiment returned from AWS Comprehend. Now, whenever the user likes a quote, certain queries are fired to get the most relatable quote based on sentiments. Detailed documentation for Elasticsearch is here.

The following code uses boto3 library for connection with Elasticsearch service and gives us the ability to fire queries on the database.

Documents in elasticsearch are stored like this.

After adding all the quotes with sentiment to our database, we’ll use queries to get the closest positive sentiment quote. An example of one such query is below.

Conclusion

Getting insights related to NLP is easier and faster with AWS comprehend. It just takes a few minutes to set it up and running with its pre-trained models. Comprehend has multiple such services for analyzing text using machine learning.