How to reduce 40% cost in AWS Lambda without writing a line of code!

Published in

Insider Engineering

8 min readNov 7, 2022

Sounds amazing right? Let’s dive deep into it.

Context

The lambda we worked on is responsible for validating, transforming, and distributing almost every event Insider collects from thousands of partners worldwide. This means this lambda is processing approximately 410 million records every day.

AWS Lambda

AWS Lambda is a serverless, event-driven compute service that lets you run code for virtually any type of application or backend service without provisioning or managing servers. [1]

Why We Use It

There are multiple solid reasons why we use AWS Lambda.

It shoulders the availability and scalability problems on our behalf of us.
Great integration with other AWS products like SQS, SNS, Kinesis, or API gateway.
Out-of-the-box logging features with AWS CloudWatch.
It costs less in most cases since it only charges you for execution time.

AWS Lambda Pricing

The pricing formula is straightforward in AWS Lambda, it is (Selected Memory Size) X (Execution Time). You can check the current prices for memory at this link. It looks ridiculously low but use it with caution, here is a good example of what can happen.

AWS Lambda Computing Power

As you can notice, we never talked about CPUs. That’s because you can only configure memory on AWS Lambda and the more memory you get, the more CPUs you’ll have. More memory means greater cost but also less execution time. Since you get charged from a combination of both it can be cheaper to run it on more memory, especially for CPU-intensive applications.

Here are an example application’s costs. Example code calculates prime number.

Table of 1000 invocations of a function that computes prime numbers may have the following average durations at different memory levels

AWS Lambda Configurations

Let’s look at Lambda configurations that are used to contribute to the cost reduction in our case.

Runtime Architecture
This setting is for selecting the type of computer processor that Lambda uses to run the function. Current options are ARM64 and X86_64.

Memory
This setting determines the amount of memory available for your Lambda function during invocation.

Timeout
This setting determines how many seconds can an invocation lasts.

AWS Kinesis

Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. Amazon Kinesis offers key capabilities to cost-effectively process streaming data at any scale, along with the flexibility to choose the tools that best suit the requirements of your application.[2]

AWS Lambda With Kinesis Integration

Integrating Kinesis Data Streams to Lambda is a fairly easy task, you can use the web UI or CLI to do this.

Here is how to add kinesis to your lambda with CLI.[3]

AWS lambda create-event-source-mapping --function-name <your-lambda-name> --event-source  <your-kinesis-arn> --batch-size 100 --starting-position LATEST

AWS Lambda-Kinesis Configurations

Although it’s easy to add Kinesis to your Lambda, it can be hard to understand some of the configurations and their effect on cost and performance.

Here are some of the configurations that I find Important to know.[4]

Batch Size
The maximum number of records in each batch that Lambda pulls from your stream or queue and sends to your function.
Small batch sizes can increase your ingestion time and big batch sizes can cost you more than you expected because more execution time will be needed.

Maximum Retry Attempts
Discard records after the specified number of retries.
Default is infinite on this setting, this means if you have an error on your new deployment or had a faulty record from a kinesis, your lambda will be stuck processing the same batch until you notice, this can significantly increase your costs especially if you didn’t set a proper timeout for your lambda.

Parallelization Factor
The number of batches to process from each shard concurrently.
If your lambda can’t keep up with your kinesis load, you can increase this for concurrent executions. Note that this setting is a trade-off between queue time and cost, and it needs to be used with caution.

Maximum Batching Window in Seconds
The maximum amount of time, in seconds, that Lambda spends gathering records before invoking the function.
It says to kinesis it can wait up to X seconds for filling the batch size. You can set your kinesis batch to 1000 but you can still see lambda invocations less than 1000. This is because kinesis will invoke the lambda as soon as records are written to the stream. This setting again is a trade-off between queue time and cost.

Cost Optimization on Lambda

After a brief introduction to the basics, we can dive into how we decreased our costs by 40%. We mostly followed empiric methods and incrementally changed settings to see how it affects the costs and performance.

Here is the AWS CloudWatch Insight query to monitor the relative cost changes.

filter @type = "REPORT"
| stats sum(@duration * @memorySize) by bin(1m)
| sort @timestamp desc

This is the cost graph that we used on our optimization iterations. It’ll not give us the actual cost but it’ll give us changes between different iterations.

We followed the steps below.

Creating a staging Lambda
We created a second lambda identical to our production one, but it has no write access and will not affect production.
Connecting Production Kinesis
We used production kinesis on this lambda. Note that kinesis has a limit on how much data you can read from it at a given time, but we knew that we are way below the limit, so it was ok to use it.
Taking the Baseline Measurements
This is an essential step for any optimization, without a base, you will not have a sense of direction on where to go in your optimization process.
Empirical Testing
We started to change settings systematically, wrote everything down, and monitored the cost graph given above. We further increased or decreased the changed value depending on where it went on the cost graph. Ideally, we needed to create a configuration matrix and try every combination to be sure we do not stick on any local maxima but it’ll take a lot of time and probably would not be cost-effective (our intuition was there would be no local maxima). So, we went with a greedy approach, and we changed one setting at a time to maximize that settings’ cost reduction, and then proceeded to other settings.
Risk and Impact Analysis
After obtaining the result of our tests, we sorted the configurations from cheapest to most costly, then factored in system delays that comply with our SLAs, finally we factored in possible risks that are involved with the new settings and come to a conclusion.

Winning Configurations

Memory was unchanged. Higher memory configurations bring very little benefit and high risk. In our case, 2X memory was slightly faster than the one we were using but using 2X comes with a risk. The risk is it costs 2 times more for every millisecond spend on invocation and seemingly innocent development with no noticeable performance decrease can cause unexpected costs after the end of the month.
The batch size was unchanged. We observed that it will not affect the performance or costs unless it’s under or above certain thresholds.
The parallelizationFactor was changed from 3 to 2. This setting caused a little delay in queue ingestion but it’s at a three-digit ms level, so it was not a big problem. Also, overall queue ingestion time is improved around 10 times (5-digit ms to 4-digit ms) with all the configurations combined.
The maximumBatching Window in Seconds was changed from 0 to 3. We have a lot of data to ingest but since we used parallelization on our lambda (around 70 parallel invocations at the same time), we needed at least 1 second of the batching window. We observed that it only adds 2- digit ms delay to ingestion time and brings significant cost reduction.
Runtime Architecture is changed from X86_64 to ARM64. This results in no visible performance increase for our case but AWS charges 25% less for ARM64 so we went with it.

These configurations without runtime architecture results in around 16% cost reduction. After factoring that the 25% reduction comes from runtime architecture, we end up with a %40 cost reduction, and the most surprising thing is we didn’t change the code at all. Welcome to the age of the cloud.

Effects of Optimization

Cost reduced by 40%.

Concurrent executions dropped. This is mostly due to a combination of decreased parallelization factor and Maximum Batching Window configurations. This graph has no real effect on performance.

Invocations dropped. This is also mostly due to a combination of decreased parallelization factor and Maximum Batching Window configurations. This graph has no real effect on performance.

Duration increased. This is normally an undesirable outcome, but this is the result of having less invocation and overall change resulted in cost reduction so no problem here. This slows the ingestion time, but it still complies with our SLA, so we went with it.

(Blue is the minimum and orange is the average duration for a given minute)