How to build simple business reporting from your data set in AWS
A step-by-step guide to connecting assets in a VPC to generate a weekly e-mailed report
It’s always good to know how customers are using your functionality. In my side projects, knowing my customers’ behavior lets me add features to meet their needs. On a small project, you likely don’t have the time or money to put into place a full data pipeline and analytics package. But you can still gain some meaningful insights with a lightweight approach to processing the data you’re already collecting.
In this post, I’ll describe how I generated a weekly report based on information stored in my database. My project allows customers to “like” certain products. A user can see the products that they’ve liked in one space. As this is the primary use case, this collection of liked products is stored in a table indexed by the user ID. But it’s useful for me to know which products are trending — that are getting liked by more users each week. I’m happy to receive this information in a weekly summary report.
My starting infrastructure was a simple one in AWS — an EC2 instance connecting to an RDS Postgres database. The instance and database are inside a VPC to protect access to the database via APIs exposed on the server.
I wanted a script that reads each user from the database, checks their liked products, and outputs a message of likes per product. I want to store this report so I can compare each report to the previous week and view trends. A lambda function triggered by a CloudWatch cron job is an ideal way to handle the compute portion of the task. I chose S3 as a low-cost option to store the history of reports. And I chose an SNS topic to send the notification, as this allows me to publish this report to an e-mail or SMS recipient.
Create an S3 Bucket
First, I started by creating an S3 bucket to hold the history of reports.
- From the S3 dashboard in the AWS Management console, click “Create Bucket”
- The name of the bucket needs to be globally unique. Pick a name that will be unique to you. In my case, I went with the name “garrett-send-followers-reports”
- Click the Create button to create the bucket with default options
4. Update the Bucket Policy for your bucket by clicking on it, then selecting the Permissions tab and clicking the Bucket Policy button. Enter in the following policy to allow your VPC to access this S3 bucket (putting in your VPC ID where noted):
"aws:sourceVpc": "YOUR VPC ID"
5. Click the Save button to save the S3 policy.
Create an SNS topic
Next I created an SNS topic. An SNS topic can be used to send either e-mails or SMS messages.
- In the AWS Management console, go to the SNS dashboard and click Create Topic.
- Fill out the Name of your topic along with the Display name to use as the sender name for generated e-mails.
- Click the Create topic button to create your topic .
4. Once you create the topic , take note of the ARN of this SNS topic. You’ll need that when we create the Lambda function that will publish to this topic.
5. Add a subscription for each recipient by clicking the Create Subscription button.
6. Set the Protocol to either e-mail or SMS, depending on how the recipient will be receiving your message. E-mail is more expensive than SMS, but both are cost effective if you are only notifying a few recipients. The AWS Free Tier allows you to send 1,000 e-mail messages and 1 million SMS messages per month for no cost.
7. Set the endpoint to either an e-mail address or mobile phone number.
8. Click Create Subscription to create the subscription. The recipient will receive a message to confirm that they want to subscribe to this topic.
Create a role for the Lambda function
Before creating the Lambda function, I created an appropriate IAM role for the function. It needed access to CloudWatch to log output and be triggered by a cron job to run on a weekly basis. It also needed access to the SNS topic and S3 bucket that I created above.
- Open the IAM dashboard and create a new policy to publish to your SNS topic. Start by clicking on the Policies link in the left side of the IAM console.
- Click Create Policy to create a new policy, and switch to the JSON editor.
- Enter JSON that looks like the following, substituting in the ARN of the SNS topic you created above:
"Resource": "SNS ARN goes here"
4. Click Review Policy and enter a Name for the policy. In my case, I called it SendFollowersPublish. Click the Create Policy button to create your policy.
5. Repeat these steps to create a policy providing access to the S3 bucket that you created using a JSON that looks like this:
6. Go to the Roles link in the IAM console and click the Create Role button.
7. When asked to choose the service that will use this role, click Lambda and then click the “Next: Permissions” button.
8. Attach the following policies to this role:
- The SNS topic policy created above
- The S3 bucket policy created above
9. Click the “Next: Tags” button to enter tags (these are optional to enter), and then click “Next: Review.” On this screen, enter a name for your role and then click Create Role. In my case, I called my role SendFollowersRole. Take note of the ARN of this role, as you’ll need it when you create the Lambda function in the next step.
Create the Lambda function
The code for my function is unique to my project and beyond the scope of this post. In a nutshell it iterates through users in my database to prepare a summary report to send to SNS. It uses S3 to compare data to the previous run and message which products are trending.
Because the Lambda function needs access to my RDS database, I needed to create it within my VPC. To do this, I used the AWS command-line interface:
aws lambda create-function --function-name SendFollowersFunction --runtime nodejs8.10 --role ARN of SendFollowersRole --zip-file fileb://./Archive.zip --handler index.handler --vpc config SubnetIds=comma-separated-vpc-subnet-ids,SecurityGroupIds=comma-separated-security-group-ids -- memory-size 512
In this command line, I used the ARN from the IAM role that I created in the previous step. I got the appropriate subnet IDs and security group IDs from the VPC dashboard to put into the command line. This line also presumes that the code and modules for the Lambda function are in a zip file named Archive.zip.
Check the IP address of the VPC under the Lambda function Network settings. We need to make sure this is set as an appropriate inbound rule for the RDS database. If it is not, add this as an IP address to the inbound rules.
Create a VPC Endpoint
So now you have a Lambda function created within a VPC that has authorization to publish to an SNS topic. Since it is within a VPC, it can’t communicate with a service outside that environment like SNS. You can use AWS PrivateLink to connect with supported AWS services. With AWS PrivateLink you don’t have to set up an internet gateway, NAT device, or public IP address. To do this, we set up a VPC Endpoint and connect it to (in this case) SNS using the following steps:
- In the VPC dashboard of the AWS Management console, go to the Endpoints link and click Create Endpoint.
- Select com.amazonaws.us-east-1.sns as the service to connect to (or whatever region your VPC is in).
3. Assuming you have a single VPC with a default set of subnets, you can keep the default selections on this page.
4. Click the Create Endpoint button at the bottom of this form.
5. Repeat these steps to create an endpoint to S3.
These steps demonstrate how to build a lightweight report based on data in your database. Because the data is read and processed within a single VPC with serverless architecture, you minimize your cloud and data transfer costs. While this approach works for side projects with hundreds or thousands of records, you’d want to consider more robust data processing solutions for larger-scale data sets.
Most importantly though is the thought process about valuable business insights you can glean from looking at your data from a different vantage point!