Day 35 of 100DaysofML

Charan Soneji
100DaysofMLcode
Published in
4 min readJul 21, 2020

AWS Comprehend. So I thought of writing a bit about the ML pipeline within the AWS infrastructure. So AWS offers a number of ML based services and I have discussed a few in one of my older blogs but I thought of mentioning a few more over here.

We know that Unstructured data is growing exponentially because of the number of applications that we have running at a given time and their own configurations plus almost everything we know is collecting data at a steady pace comparable to almost every second. Now this unstructured data can be very useful if it utilized and understood in a proper manner. Infact AI is enabling solutions to analyze text with human like context. An example can be to use chatbots to be able to interact with customers with the available given data for which NLP becomes another application.

The main reason of as to how we are being able to bring value to this kind of data is due to Machine Learning and a very similar application is provided by AWS Comprehend which focuses on providing almost 5 different features which work completely on Deep Learning.

  • The first one is sentiment. Sentiment allows you to understand whether what the user is saying is positive or negative. Or even neutral, sometimes that’s important as well. You want to know if there’s not sentiment, that might be a signal.
  • The next one is entities. This feature goes through the unstructured text and extracts entities and actually categorizes them for you. So things like people, or things like organizations will be given a category. And we’ll walk through more detail what that means.
  • The third capability is language detection. So for a company that has a multilingual application, with a multilingual customer base.You can actually determine what language the text is in. So you know if you have to translate the text itself, or take some other kind of business action on the text.
  • The fourth capability is key phrase, think of this as noun phrases. So where entities are extracted, is maybe proper nouns. The key phrase will catch everything else from the unstructured text, so you actually can go deeper into the meaning. What were they saying about the person? What were they saying about the organization for example?
  • And then the fifth capability is topic modeling. Topic modeling works over a large corpus of documents. And helps you do things like organize them into the topics contained within those documents. So it’s really nice for organization and information management.
Features provided

Let us take an example. Take a blog post from MEDIUM itself and assume that we are giving it to Amazon Comprehend. Now if we are using the Topic modelling as a service, what it does is that it classifies all of the data from the blog into different categories based on the analysis of data that it has gotten from the blog and it puts them into sections based on the categories or titles of the blog sections. This way it helps the user in identifying relevant posts.

There are a few reasons why this model or service is valued a lot. It is mainly because of 3 features which are:

  • Accuracy
  • Continuously trained: The model is trained based on your data as well as Amazon experts.
  • Easy to use

I used a bit of comprehend and you can have a look at the explorer and you can see how the entities have been extracted and how the key phrases have been extracted and basically helps us in identifying the topic being discussed and it can also determine the language being used.

You may also get 2 csv files which will tell you about the data present in the text provided and the other would be about related articles.

Below is a short video on the release of Comprehend and a rough overview:

That’s it for today. Thanks for reading. Keep Learning.

Cheers.

--

--