How to process and analyze text using Amazon Comprehend, AWS Lambda and S3 bucket

Vijayaraghavan Vashudevan
3 min readJul 28, 2023
Processing the text using Amazon Comprehend

๐Ÿ’ In this use case, we will see how to process and analyze text using Amazon comprehend, AWS Lambda and S3 Bucket.

๐ŸŽฏ Overview of Amazon Comprehend

๐Ÿ“Œ Go to AWS console, search for Comprehend.

๐Ÿ“Œ Amazon Comprehend offers various features to process and analyze text:

  • Sentiment Analysis: Comprehend can determine the sentiment expressed in a piece of text, classifying it as positive, negative, neutral, or mixed.
  • Entity Recognition: It identifies and extracts entities such as people, places, organizations, dates, and more from the text.
  • Key phrase Extraction: The service can automatically identify and extract important phrases or keywords from the given text.
  • Language Detection: Comprehend can identify the dominant language used in the provided text.
  • Topic Modeling: It can analyze a collection of documents and group them into topics based on common themes.
  • Syntax Analysis: The service can provide information about the grammatical structure of the text, such as identifying parts of speech, recognizing syntax errors, etc.
  • Document Classification: Comprehend can categorize documents into custom classes based on the content.
Amazon Comprehend

๐ŸŽฏ Creation of Lambda function and S3 trigger

๐Ÿ’ซ Go to AWS console, search for Lambda. Create an function customer_sentiment_analysis_function and add the below python code

Lambda function
import json
import os
import logging
import boto3
import datetime
from urllib.parse import unquote_plus

logger = logging.getLogger()
logger.setLevel(logging.INFO)


s3 = boto3.client('s3')

output_bucket = os.environ['OUTPUT_BUCKET']

data_arn = os.environ['DATA_ARN']

output_key = "output/comprehend_response.json"

def lambda_handler(event, context):

logger.info(event)
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = unquote_plus(record['s3']['object']['key'])

now = datetime.datetime.now()
job_uri = f's3://{bucket}/{key}'
job_name = f'comprehend_job_{now:%Y-%m-%d-%H-%M}'

comprehend = boto3.client('comprehend')

try:
response_sentiment_detection_job = comprehend.start_sentiment_detection_job(
InputDataConfig={
'S3Uri': job_uri,
'InputFormat': 'ONE_DOC_PER_LINE',
},
OutputDataConfig={
'S3Uri': f's3://{output_bucket}/output/'
},
JobName=job_name,
LanguageCode='<Enter_language_code>',
DataAccessRoleArn=data_arn,
)

sentiment_result = {"Status":"Success", "Info":f"Analysis Job {job_name} Started"}

s3.put_object(
Bucket=output_bucket,
Key=output_key,
Body=json.dumps(response_sentiment_detection_job, sort_keys=True, indent=4)
)

except Exception as e:
sentiment_result = {"Status":"Failed", "Reason":json.dumps(e, default=str,sort_keys=True, indent=4)}

return sentiment_result

๐Ÿ’ซ In above code, we need to input the InputFormat and LanguageCode as per documentation

๐Ÿ’ซ Deploy the code, once necessary changes are made.

๐Ÿ’ซClick on Add trigger, add the S3 Bucket and input folder which has been created.

Adding trigger

๐Ÿ’ซ We will get the below notification once trigger configured successfully

Adding events

๐Ÿ“ขHands-on Demo

๐ŸŽฏ With this, we will see the demo of how to process the text using amazon comprehend along with AWS Lambda and S3 bucket.

๐ŸŒInstructions to clean up AWS resource to avoid Billing

๐Ÿ“Œ Delete the S3 bucket created

๐Ÿ“Œ Delete the lambda function created once trigger point is removed

Thanks for being patient and followed me. Keep supporting ๐Ÿ™

Clap๐Ÿ‘ if you liked the blog

For more exercises โ€” pls do follow me below โœ…!

https://www.linkedin.com/in/vijayaraghavanvashudevan/

#AWS #AWSCommunityBuilder #AWSreSkill #AWSLambda #AmazonComprehend #S3Bucket

--

--

Vijayaraghavan Vashudevan

Hi Everyone !! Am here to publish the technical topics for the community which includes cloud concepts, postman, automation, RPA etc. Please do follow me :)