Data processing with Slack and serverless backend on AWS

This article is a walkthrough on how to build a data processing pipeline with Slack and AWS. By definition, the data processing means retrieval, transformation, and classification of information.

The function of the created system is to organise documents retrieval and storage. Information provided and classified only by users and there are no data transformations.

The system’s workflow is a three steps process:

  • data input
  • data transfer into processing/storage unit
  • data processing/storage

Users input data via Slack. They upload info files and annotate them with hashtags in comments. Serverless infrastructure on AWS is processing and storing information. And the last piece of the puzzle is a Slack application. It is a ‘glue’ between data source and backend. It transfers and triggers processing of a new chunk of data.

Slack application

There are a lot of tutorials on how to build a Slack application. Thus I will highlight only the most valuable details of created application.

I developed a basic application with the subscriptions on the following events:

  • message.channels
  • message.groups
  • message.im
  • message.mpim

Please note, I did not implement the OAuth flow. It means that the Slack application is not available in App Directory and can not be used by the other team.

Nevertheless, the backend needed the following credentials to integrate with Slack:

  • Verification token
  • Access token

Verification token required to confirm request sender. All requests from Slack contain a token. And the token’s value should be equal to the verification token. Here is an example of a Slack event request payload:

{
"token": "Jhj5dZrVaK7ZwHHjRyZWjbDl",
"team_id": "123",
"api_app_id": "xyz123",
"event": {
...
},
"type": "event_callback",
"authed_users": [
"AAABBBCCC"
],
"event_id": "EVENTID123",
"event_time": 1499081899
}

Verification token resides in Basic Information->App Credentials settings section of the application. It called ‘Verification Token’.

Access token is used to fetch files from Slack and send notifications to Slack. It resides in OAuth & Permissions->OAuth Tokens & Redirect URLs settings section of the application. It called ‘Bot User OAuth Access Token’.

Verification and access tokens were injected into the backend as VERIFICATION_TOKEN and ACCESS_TOKEN environment variables respectively.

Serverless backend on AWS

The serverless infrastructure consists of the following components:

  • API Gateway
  • S3 Bucket
  • SNS topic
  • Four lambda functions
  • Step-function
Figure 1. Slack integration with AWS infrastructure

AWS Lambda functions

slack-event-handler is triggered by API gateway (Figure 1, step 3). It handles Slack registration and Slack message requests. The function always returns a success response to the gateway except for the case when the request’s token is invalid. In a case of other failures, it sends error notifications to SNS (Figure 1, step 4.1). On the last step of successful case slack-event-handler invokes a step function which starts message processing (Figure 1, step 4).

Figure 2. Slack request handling workflow

The step function orchestrates slack-file-fetcher and slack-metadata lambda functions (Figure 1, steps 5 and 8). First, slack-file-fetcher fetches info files from Slack (Figure 1, step 6). The source URL is passed by slack-event-handler and comes from the Slack event (see the payload example).

Fetched files stored into an S3 bucket (Figure 1, step 7). Then slack-metadata is invoked to process annotations. The processing result saved in metadata file into the S3 bucket (Figure 1, step 9). In a case of failures, the functions send error notifications to SNS and stop the execution (Figure 1, steps 8 and 10). When slack-metadata successfully resumed it also sends a notice to SNS.

slack-notifier listens to the notifications from SNS and forwards them to Slack (Figure 1, step 11). This is the only function which sends messages to Slack (Figure 1, step 12).

Environment variables for lambda functions

  • VERIFICATION_TOKEN — Slack verification token
  • ACCESS_TOKEN — Slack access token
  • SLACK_URL — Slack endpoint to send notification messages to
  • SLACK_INTEGRATOR_SNS — AWS SNS topic ARN to send data processing result to
  • SLACK_INTEGRATOR_SF — AWS Step Function ARN to start data processing
  • BUCKET — AWS S3 bucket to store info files and metadata
  • NODE_ENV (optional) — NodeJS (lambda) environment prod|dev|test
  • DEBUG (optional) — allows debug information logging
  • ERROR (optional) — allows error information logging
Table 1. AWS Lambda functions environment variables dependencies

All functions support DEBUG and ERROR variables. Additional information will be logged in CloudWatch when DEBUG or ERROR is active.

NODE_ENV only used for local testing of the file fetching with slack-file-fetcher.

Step function

The step function is very simple. It sequentially calls two lambda functions. Here is the full listing of the step function:

{
"Comment": "Slack integration",
"StartAt": "FetchFile",
"States": {
"FetchFile": {
"Type": "Task",
"Resource": "arn:aws:lambda:REGION:ACCOUNT:function:slack-file-fetcher",
"Next": "SaveMetadata"
},
"SaveMetadata": {
"Type": "Task",
"Resource": "arn:aws:lambda:REGION:ACCOUNT:function:slack-metadata",
"End": true
}
}
}

AWS policies

In addition to default lambda and step function policies, one custom policy had been created:

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "Stmt1",
"Effect": "Allow",
"Action": [
"states:StartExecution"
],
"Resource": [
"arn:aws:states:REGION:ACCOUNT:stateMachine:*"
]
},
{
"Sid": "Stmt2",
"Effect": "Allow",
"Action": [
"sns:Publish"
],
"Resource": [
"arn:aws:sns:REGION:ACCOUNT:slack-integrator"
]
},
{
"Sid": "Stmt3",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:ListBucket",
"s3:PutObject"
],
"Resource": [
"arn:aws:s3:::slack-integrator/*"
]
}
]
}

It grants access to S3, SNS and step function. And used by slack-event-handler, slack-file-fetcher, and slack-metadata lambda functions.

Testing and code structure of lambda functions

In serverless architecture, testing is even more important than in traditional architecture types. Deployment, execution, and monitoring of lambda functions are time-consuming and not free. So, it’s very important to test as much as possible on a local development environment.

The need to check most of the code locally reflected in the code structure. In accordance to this, I organised lambda function sources with minimum callback nesting. To orchestrate asynchronous operations I used Async library.

Instead of invoking an anonymous function in callback I decided to call exported methods. It allowed me to stub methods and isolate tests. In the tests, I confirmed which stubs were called and validated input parameters. Please see the lambda functions sources in repositories for examples.

This approach allowed me to build a solid unit tests basement. Deployment and integration testing were conducted manually.

Tools

Notes and gotchas

  • AWS will show a warning message when uploading .zip archive with lambda and archive is too small.
  • Lambda function which handles SNS notifications should execute as fast as possible. Default execution limit for lambda is 3 sec. Lambda function execution considered failed when timeout exceeded. It causes notification delivery retry and the same lambda function invoked again.

Conclusion

So, it was a walkthrough on my experiment of building data processing pipeline with Slack and serverless backend on AWS. Please let me know if the implementation requires more detailed explanation. And thank you for reading this article.

References