🏦 Analyzing my monthly expenses with S3 triggers and AWS Lambda using AWS CDK

Published in

NBTL

4 min readOct 31, 2021

AWS S3 offers event notifications that can send messages to your destinations upon a specific event. You can use these for things like data hygiene, extra security measures or low-cost serverless processing.

Supported destinations for notifications include:

Amazon Simple Notification Service (Amazon SNS) integrates it with virtually any significant system architecture.
Amazon Simple Queue Service (Amazon SQS) queue is suitable if you expect a high volume of files and events.
AWS Lambda is a good option when you don’t expect a lot of files, and you don’t need to de-couple the input and output process.

What are we building?

You’ll always want to use some form of Infrastructure as Code (IaC), and for that, my preference is AWS CDK. It’s just a straightforward way to deploy your resources, and its been growing consistently over the last couple of years. We will use the AWS Lambda destination directly without attaching SQS or SNS in between. Once finished, our Lambda function will print to CloudWatch.

Simple overview of intended architecture

Creating the infrastructure with CDK

First, we will need to initialise the AWS CDK project, define our resources, and open our IDE Visual Studio Code.

Create the CDK project and open your IDE

Next, we will define our resources in our main CDK stack. These include:

The AWS S3 bucket where we will put the bank statements.
The AWS Lambda function analyses the bank statements and identifies our spending in categories.
The Event Source notifies the AWS Lambda function of any newly created object in our S3 bucket.

If you are interested in learning more about how you would structure more complex projects, we have written a post about that here.

The CDK stack that creates our required resources

Analysing the statements with Python

If we now drop a file in our new S3 bucket, it will trigger an execution of our AWS Lambda function. Now, it’s time to write our code that takes a look at our monthly spending.

Firstly, we will want to download the file from S3 in our Lambda function to read it.

Download the file that triggered the AWS Lambda function

With access to the file, we can iterate over the contents to fetch the record and categorise it. For the demonstration of this post, we are doing nothing complex and have just created a set of categories with keywords that we want to match against the description of the records.

Iterate over the rows from the CSV and categorize tags

The Result

The result gives me a better picture of my monthly spending but, more importantly, how my AWS bill will not be a big part of it. Thanks to the Serverless components and the fact that I’m not doing any heavy lifting computational-wise, my bill will remain low.

You can imagine more use-cases for this solution, out of which popular ones are:

Creating automatic thumbnails upon image drop
Automatically moving high-res assets to cold storage while creating and keeping low-resolution variants
Upon creation checking metadata and security aspects of a file before making it available in the rest of your application

Remember This

Send events to SNS/SQS if you want to de-couple the process and allow for more scale. That way, you can increase the file per Lambda ratio and be more cost-effective.
There is no guarantee for latency which makes it unreliable for live data use cases.
Serverless means cost per invocation, which makes solutions like these very cost-effective. If you are not using it, you are not paying for it.

If you are looking for the complete project source, you can find the entire project in my repository here.

Are you looking for more?

We’re launching a pretty newsletter soon that will include weekly posts like these. You can find the newsletter and signup here.