Serverless Data Processing with AWS Step Functions — An Example
Building on the pros and cons of AWS Step Functions as evaluated by Nikola Milicic here, this article presents our working example of the conditional and potentially complex data pipelines that can be achieved serverlessly via AWS Step Functions and their integration with several AWS cloud products, such as AWS Lambda.
The Use Case
In this example use case, we are processing the customer reviews of a company which provides many products and services. The company wants to handle negative reviews quickly and appropriately, but does not have the resources to read each review.
By using an AWS Step Function in the data pipeline for customer reviews, each review can be categorised by metadata, sent to the appropriate sentiment analysis (implemented with AWS Comprehend), send out automatic notifications if negative feedback is detected, write relevant data to the review database, and appropriately handle errors at each stage of this process, all achieved serverlessly.
The State Machine
Step Functions allow us to define a state machine using Amazon States Language (ASL), which is essentially a JSON object that defines the available states of the state machine, as well as the connections between them.
AWS generates a flowchart from our ASL code, which allows us to better visualise the machine, as seen here:
AWS Automatically generates and updates your flowchart as you modify your state machine.
Additionally, each state machine execution provides a colour-coded flowchart of status and outcome, improving the visibility of failure points and paths taken through the machine. An example of this shown below.
The full ASL code is accessible from our GitHub repository, available here.
Contribute to servian/aws-step-function-example development by creating an account on GitHub.github.com
We can use a Lambda function to trigger the state-machine and pass through the file details, upon its landing in the S3 bucket. In our example, we trigger the Lambda function each time a CSV file is dropped in the /reviews folder of our S3 bucket, which in turn triggers the Step Function.
Step machines can also be triggered using other trigger conditions and other methods. For example, periodic triggers can be set using Amazon CloudWatch Events rules.
Reading the Input
In our example, reviews are submitted as single-line CSV files with parameter headers. However, this implementation could easily be extended to other input data formats, perhaps utilising alternative storage solutions from AWS, such as Kinesis Streams or SQS.
The Lambda function below reads the input file and outputs its data.
Product or Service?
Our company wants the capability to address reviews differently based on whether the review is for a product or service. To implement this functionality we use a “Choice” block of the Step Function.
More complex use cases could employ multiple branching classifications, or even complex ML models for classification, stored in S3 and evaluated using Lambda functions.
We perform sentiment analysis on the review data, returning ‘POSITIVE’, ‘NEGATIVE’, or ‘NEUTRAL’. Calling Amazon Comprehend in a Lambda function, we can achieve this with not much more than one line of code:
A more detailed implementation could use other features of Amazon Comprehend to further delineate the review procedure, perhaps by detecting entities or named entities in the customer’s review to determine the most appropriate company department or staff member to alert.
Positive or Negative?
As our use case is focused on handling negative reviews, our pipeline only sends a notification via SNS if the result of the sentiment analysis is “NEGATIVE”. This is implemented using a second “Choice” block in the state machine:
Using a choice block allows easy and transparent error handling, as indicated by the default choice states of SentimentFail (unexpected result from sentiment analysis) and CategorisationFail (unable to categorise the input).
In our example, a notification is sent to the company via email whenever a negative review is placed. Using AWS’s Simple Notification Service (SNS), this is achieved in a single Lambda function call. Additionally, the message contains dynamic information depending on the details of the data being processed.
Writing results to the Database
Whether the result from the sentiment analysis is positive or negative, we want to store the results in a database for future reference. In our example, we use a short Lambda function to write to the database and pass input variables through to the output.
AWS’s DynamoDB is well-integrated into both Step Functions and Lambda, meaning that similar functionality can also be achieved directly through AWS Step Functions.
Using AWS Step Functions and Lambda, we have demonstrated how a serverless data pipeline can be achieved with only a handful of code, with a relatively high complexity ceiling, in-built scaling and parallelisation, and integration with many diverse AWS products. The above example is only a toy example for demonstration purposes, but hopefully it will serve as inspiration for other ideas which can now be achieved serverlessly.