How to process and analyze text using Amazon Comprehend, AWS Lambda and S3 bucket

Vijayaraghavan Vashudevan

3 min readJul 28, 2023

Processing the text using Amazon Comprehend

💠In this use case, we will see how to process and analyze text using Amazon comprehend, AWS Lambda and S3 Bucket.

🎯 Overview of Amazon Comprehend

📌 Go to AWS console, search for Comprehend.

📌 Amazon Comprehend offers various features to process and analyze text:

Sentiment Analysis: Comprehend can determine the sentiment expressed in a piece of text, classifying it as positive, negative, neutral, or mixed.
Entity Recognition: It identifies and extracts entities such as people, places, organizations, dates, and more from the text.
Key phrase Extraction: The service can automatically identify and extract important phrases or keywords from the given text.
Language Detection: Comprehend can identify the dominant language used in the provided text.
Topic Modeling: It can analyze a collection of documents and group them into topics based on common themes.
Syntax Analysis: The service can provide information about the grammatical structure of the text, such as identifying parts of speech, recognizing syntax errors, etc.
Document Classification: Comprehend can categorize documents into custom classes based on the content.

🎯 Creation of Lambda function and S3 trigger

💫 Go to AWS console, search for Lambda. Create an function customer_sentiment_analysis_function and add the below python code

import json
import os
import logging
import boto3
import datetime
from urllib.parse import unquote_plus

logger = logging.getLogger()
logger.setLevel(logging.INFO)


s3 = boto3.client('s3')

output_bucket = os.environ['OUTPUT_BUCKET']

data_arn = os.environ['DATA_ARN']

output_key = "output/comprehend_response.json"

def lambda_handler(event, context):
 
    logger.info(event)
    for record in event['Records']:
        bucket = record['s3']['bucket']['name']
        key = unquote_plus(record['s3']['object']['key'])
        
        now = datetime.datetime.now()
        job_uri = f's3://{bucket}/{key}'
        job_name = f'comprehend_job_{now:%Y-%m-%d-%H-%M}'

        comprehend = boto3.client('comprehend')
        
        try:
            response_sentiment_detection_job = comprehend.start_sentiment_detection_job(
                        InputDataConfig={
                            'S3Uri': job_uri,
                            'InputFormat': 'ONE_DOC_PER_LINE',
                        },
                        OutputDataConfig={
                            'S3Uri': f's3://{output_bucket}/output/'
                        },
                        JobName=job_name,
                        LanguageCode='<Enter_language_code>',
                        DataAccessRoleArn=data_arn,
                    )
            
            sentiment_result = {"Status":"Success", "Info":f"Analysis Job {job_name} Started"}
        
            s3.put_object(
                Bucket=output_bucket,
                Key=output_key,
                Body=json.dumps(response_sentiment_detection_job, sort_keys=True, indent=4)
            )
        
        except Exception as e:
            sentiment_result = {"Status":"Failed", "Reason":json.dumps(e, default=str,sort_keys=True, indent=4)}
        
        return sentiment_result

💫 In above code, we need to input the InputFormat and LanguageCode as per documentation

💫 Deploy the code, once necessary changes are made.

💫Click on Add trigger, add the S3 Bucket and input folder which has been created.

💫 We will get the below notification once trigger configured successfully

📢Hands-on Demo

🎯 With this, we will see the demo of how to process the text using amazon comprehend along with AWS Lambda and S3 bucket.

🌍Instructions to clean up AWS resource to avoid Billing

📌 Delete the S3 bucket created

📌 Delete the lambda function created once trigger point is removed

Thanks for being patient and followed me. Keep supporting 🙏

Clap👏 if you liked the blog

For more exercises — pls do follow me below ✅!

https://www.linkedin.com/in/vijayaraghavanvashudevan/

#AWS #AWSCommunityBuilder #AWSreSkill #AWSLambda #AmazonComprehend #S3Bucket

How to process and analyze text using Amazon Comprehend, AWS Lambda and S3 bucket

📢Hands-on Demo

Written by Vijayaraghavan Vashudevan