How to process and analyze text using Amazon Comprehend, AWS Lambda and S3 bucket
๐ In this use case, we will see how to process and analyze text using Amazon comprehend, AWS Lambda and S3 Bucket.
๐ฏ Overview of Amazon Comprehend
๐ Go to AWS console, search for Comprehend.
๐ Amazon Comprehend offers various features to process and analyze text:
- Sentiment Analysis: Comprehend can determine the sentiment expressed in a piece of text, classifying it as positive, negative, neutral, or mixed.
- Entity Recognition: It identifies and extracts entities such as people, places, organizations, dates, and more from the text.
- Key phrase Extraction: The service can automatically identify and extract important phrases or keywords from the given text.
- Language Detection: Comprehend can identify the dominant language used in the provided text.
- Topic Modeling: It can analyze a collection of documents and group them into topics based on common themes.
- Syntax Analysis: The service can provide information about the grammatical structure of the text, such as identifying parts of speech, recognizing syntax errors, etc.
- Document Classification: Comprehend can categorize documents into custom classes based on the content.
๐ฏ Creation of Lambda function and S3 trigger
๐ซ Go to AWS console, search for Lambda. Create an function customer_sentiment_analysis_function and add the below python code
import json
import os
import logging
import boto3
import datetime
from urllib.parse import unquote_plus
logger = logging.getLogger()
logger.setLevel(logging.INFO)
s3 = boto3.client('s3')
output_bucket = os.environ['OUTPUT_BUCKET']
data_arn = os.environ['DATA_ARN']
output_key = "output/comprehend_response.json"
def lambda_handler(event, context):
logger.info(event)
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = unquote_plus(record['s3']['object']['key'])
now = datetime.datetime.now()
job_uri = f's3://{bucket}/{key}'
job_name = f'comprehend_job_{now:%Y-%m-%d-%H-%M}'
comprehend = boto3.client('comprehend')
try:
response_sentiment_detection_job = comprehend.start_sentiment_detection_job(
InputDataConfig={
'S3Uri': job_uri,
'InputFormat': 'ONE_DOC_PER_LINE',
},
OutputDataConfig={
'S3Uri': f's3://{output_bucket}/output/'
},
JobName=job_name,
LanguageCode='<Enter_language_code>',
DataAccessRoleArn=data_arn,
)
sentiment_result = {"Status":"Success", "Info":f"Analysis Job {job_name} Started"}
s3.put_object(
Bucket=output_bucket,
Key=output_key,
Body=json.dumps(response_sentiment_detection_job, sort_keys=True, indent=4)
)
except Exception as e:
sentiment_result = {"Status":"Failed", "Reason":json.dumps(e, default=str,sort_keys=True, indent=4)}
return sentiment_result
๐ซ In above code, we need to input the InputFormat and LanguageCode as per documentation
๐ซ Deploy the code, once necessary changes are made.
๐ซClick on Add trigger, add the S3 Bucket and input folder which has been created.
๐ซ We will get the below notification once trigger configured successfully
๐ขHands-on Demo
๐ฏ With this, we will see the demo of how to process the text using amazon comprehend along with AWS Lambda and S3 bucket.
๐Instructions to clean up AWS resource to avoid Billing
๐ Delete the S3 bucket created
๐ Delete the lambda function created once trigger point is removed
Thanks for being patient and followed me. Keep supporting ๐
Clap๐ if you liked the blog
For more exercises โ pls do follow me below โ !
https://www.linkedin.com/in/vijayaraghavanvashudevan/
#AWS #AWSCommunityBuilder #AWSreSkill #AWSLambda #AmazonComprehend #S3Bucket