Tekraze
Published in

Tekraze

Adding AWS Comprehend to Spring Boot

Amazon Comprehend is a natural language processing (NLP) service that uses machine learning to find insights and relationships in the text. No machine learning experience required. Check out Amazon Comprehend below.

Steps for Integration

  1. AWS Comprehend SDK

add below dependencies to pom.xml

<!-- https://mvnrepository.com/artifact/com.amazonaws/aws-java-sdk-comprehend -->
<dependency>
<groupId>com.amazonaws</groupId>
<artifactId>aws-java-sdk-comprehend</artifactId>
<version>1.11.759</version>
</dependency>

2. Create A java service and name it as you like. let say aws-comprehendService.java

3. Initialize Comprehend Client

AmazonComprehend comprehendClient() {log.debug("Intialize Comprehend Client");BasicAWSCredentials awsCreds = new BasicAWSCredentials(awsAccessKey, awsSecretKey);AWSStaticCredentialsProvider awsStaticCredentialsProvider = new AWSStaticCredentialsProvider(awsCreds);return AmazonComprehendClientBuilder.standard().withCredentials(awsStaticCredentialsProvider).withRegion(awsRegion).build();}

4. Create a detect entities method where we pass the text

public List<Entity> detectEntitiesWithComprehend(String text) {log.debug("Method to Detect Entities With Amazon Comprehend {}",   text);DetectEntitiesRequest detectEntitiesRequest = new DetectEntitiesRequest().withText(text).withLanguageCode("en");DetectEntitiesResult detectEntitiesResult = comprehendClient().detectEntities(detectEntitiesRequest);entitiesList = detectEntitiesResult.getEntities();return entitiesList;}

Note: The text Limit for Using this way is 5000 bytes. So if you need to trim, see below method.

text = trimByBytes(textToAnalyze, 5000);String trimByBytes(String str, int lengthOfBytes) {
byte[] bytes = str.getBytes(StandardCharsets.UTF_8);
ByteBuffer buffer = ByteBuffer.wrap(bytes);
if (lengthOfBytes < buffer.limit()) {
buffer.limit(lengthOfBytes);
}
CharsetDecoder decoder = StandardCharsets.UTF_8.newDecoder();decoder.onMalformedInput(CodingErrorAction.IGNORE); try {
return decoder.decode(buffer).toString();
} catch (CharacterCodingException e) {
// We will never get here.
throw new RuntimeException(e);
}
}

5. Now we got the entities in Form of the list. The List<Entity> is the list of entities processed from the text we passed.

Sample output is below

[
{
"score": 0.4398592,
"type": "ORGANIZATION",
"text": "JSON",
"beginOffset": 4930,
"endOffset": 4934
},
{
"score": 0.98848945,
"type": "ORGANIZATION",
"text": "Apple",
"beginOffset": 4960,
"endOffset": 4965
}
]

Check the full code in following Github Gist.

aws-comprehend-service.java

We used the synchronous method for processing now. Will add the asynchronous one also, next. Do share your views in the comments below. Feel free to clap if this post helps you. Thanks.

Originally Published at Tekraze.com

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Balvinder Singh

Balvinder Singh

Senior Software Engineer at thestaffbox.com working as FullStack and DevOps. Tech Blogger at Tekraze.com. Here to share my experiences of coding with you all…