Why do you still need to think of scalability when architecting Serverless apps?

8 min readDec 9, 2022

“Pay as you go” and “Unlimited Scaling” are the buzzwords that are often associated with Serverless when reading or hearing about it. Yet, we still need to think of scalability.
This article will focus on providing you the knowledge on how to approach the architecting of optimized Serverless solutions. It will guide you through a few sets of questions that you should consider when analyzing requirements and enable you to compile optimized solutions for your serverless workload.

What to do before you start scathing your solution

Functional requirements (FR) are the bases of any project development. It elaborates on what a certain solution should deliver as a functionality to the customers, however, it does not provide the full image.
In lots of cases, developers and architects just start solutionizing their FR without prior analysis of Non-functional requirements (NFR). NFRs are very important because they concern expectations such as reliability, security, performance, scalability, cost, and overall sustainability. Most of us have witnessed problems like “applications are not performing well enough”, “infrastructure is just too expensive for the amount of work it does” or “Serverless is just too expensive”. In other words, the application does what it needs to do but not as expected.
Well, those problems come mostly from underestimating or overestimating NFRs, or just neglecting them. One of the most important aspects, when architecting an application at scale is to analyze NFRs.

NFRs, such as, “expected response rate” or “required type of communication (synchronous or asynchronous)” might entirely change the architecture of certain solutions.

Where do I get NFRs?

NFRs are not always available like FRs. Sometimes, you get NFRs from the client but more often from your own analysis of the existing solution. However, if it’s a greenfield project then you would probably need to do an analysis together with the stakeholders and then give a knowledgeable guess. But try to use common sense and realistic predictions so that your optimization of the future architecture is adequate.

NFRs Analysis

In order to execute this analysis, I would point out a few questions in 3 categories. Those questions may guide you to properly analyze future Serverless workloads and create a foundation for your Serverless architecture.

Data type & size questions

What kind of data are you processing? — Does your message has only one receiver or should it be handled by multiple receivers? Is your data a message or an event? Based on that, you may determine the best-suited AWS services. If the messages have only one consumer then you may think of SQS or Kinesis depending on the purpose.
If it’s an event that should be processed by multiple consumers then SNS or EventBridge might be your choice.
What is the size of the expected payload? — Knowing the size of your payload is of great importance when selecting certain AWS Service. Some AWS services support the transfer of larger-sized payloads, but some can bare payloads only measured in KB. However, data sent as a message or event is not intended for large-sized data transfers.
For instance, Kinesis can have a message of 1MB while SNS or SQS can have a message size of up to 256KB. So, have in mind what size of payload would you like to send. In case you need larger size payloads, then you can store it as a file on an S3 bucket and send a path to the file over a message or event.

Throughput Questions

At what pace is the data coming? — It is very important to understand the expected speed of consuming and processing data. Let’s say that whatever you pick from AWS Services to ingest your data like SQS, SNS, Kinesis, DynamoDB, EventBridge, etc. will properly scale, but your processing might not scale accordingly and you may see quite high latency.
Let’s say your app is producing data from Source A and Source B. However, data coming from source A, produces multiple times more data than source B. All of the data is streamed via Kinesis to lambda for processing, which means that data ends up in the same pipe. In case the buffer is overwhelmed by data from source A, processing of Source B data will experience severe latency even though there is much less data coming from source A. In that situation, it is much better to have separate streams according to priority.
What are desired response times? — It is very important to understand if your applications require some strict response times. Therefore, you need to be aware of how your services scale and their latency in the process. In the case of Kinesis + Lambda, if your lambda processes much slower than expected (at a constant load) it will fill up the buffer quickly and build up latency over time. You can monitor this by looking for iterator age metrics.
In case data is coming at high rates, how to speed up processing? —
When it comes to Kinesis, an option to speed up processing is by processing in batches. If you need to process data in batches then Kinesis would be probably the easiest option. When working with lambdas and Kinesis, you define a batch that would trigger your lambda. In larger batches, the messages require a different amount of time to be processed, then larger batches might have unpredictable amounts of time to be processing completition. That may lead to lambda timeout which may cause the whole batch to fail. Then your lambda will retry to process the same records again which may cause issues to none idempotent services.
So you would need to optimize or mark the records that were processed successfully from the original batch.
Also, you can use enhanced fan-out so that lambda invocations are not limited to one shard.
When it comes to EventBridge, lambda invocations are limited by default to quota limit for certain regions. You can check the limits on the address https://docs.aws.amazon.com/eventbridge/latest/userguide/eb-quota.html#eb-invocations-limits
SQS on the other hand is scaling lambdas at a rate of 60 lambdas per second up to 1000 lambdas. Be aware that you may hit the soft limit of lambdas quickly in case you have so much data in the queue. This will result in other parts of the application falling due to hitting the soft limit. So mind your soft limits and request quota increases beforehand.
Is it a constant load, are there any spikes? — Most AWS services are made to scale. Kinesis can ingest 1000 rec/s per shard with a soft limit of 500 shards. SQS can handle virtually an unlimited number of messages (FIFO up to 3000 a sec). However, your processing side might cause latency, which is ok if the business does not require faster processing.
If there is a constant load you may think of reserving concurrency on the lambda side or even switching to Fargate (if it makes sense price vise). Also, you can throttle requests if the processing isn’t time critical.
Are the spikes predictable or unpredictable? — This question is an extension of the above 2 questions. Luckily we don’t need to think of peaks anymore when using Serverless. But, we need to consider the scaling abilities of data producing/consuming 3rd party dependencies (such as 3rd party services/libraries). Try to cache or degrade the service in those cases. Let the user know that certain services are unavailable at the moment or use cached data (by notifying the user that the data is stale).

Fault Tollereance Questions

How important is the payload of certain messages or events? Is certain loss acceptable or not? — This question is very important as certain services guarantee delivery like SQS, while for instance, SNS does not have strong delivery guarantees and neither do Kinesis and EventBridge. Due to that fact, you should investigate how certain messages NOT being delivered effects system functioning. SQS ensures that messages are delivered at least once, but might be more than once. Therefore, you need to ensure that your function is idempotent.
In case your system is transactional and requires every transaction to be in order and delivered then you have the option to use SQS — FIFO. SQS-FIFO guarantees to deliver messages only once and in order, however at lower throughput than regular SQS.
Are your functions idempotent?
If you are using SQS you may get one message more than one time. Let’s say your SQS message has a payload that carries information about adding a new user to the DB. In case you got that message multiple times, you might get that many records in the database.
An idempotent function in computing is a function that you call more than once with the same input parameters, it will have no additional effect.
You need to ensure that your services are providing exactly the same result as many times they are called. That is quite obvious when it comes to GET, PUT and DELETE (POST is not considered to be idempotent).
Are there any integrations with other systems and can they be a bottleneck? — I have already touched on this topic in the question “Are the spikes predictable or unpredictable?” Absolutely the same applies to this question, so you need to analyze 3rd party dependencies such as libraries and services and be aware of their shortcomings.
When the core services are dependent on the 3rd party service, then all of the solution performance and reliability are reliable as the weakest link.
3rd party dependencies that are expected to create some disruptions such as low throughput, often reliability issues or similar, need to be replaced with more reliable service. In case there is no such service there are tactics that can be implemented to avoid bad user experience. You can check the below questions regarding the error handling.
Is error handling required? — Sometimes certain loss of messages is acceptable, but sometimes it is very important that messages arrive successfully as they are crucial for the overall functioning. There are many ways where a certain percentage of messages might be lost in transit. It can happen due to different circumstances, for instance, deployment errors, human errors, configuration issues, service unavailability, etc.
There are certain tactics that can help save data or notify users that something is out of order atm.
Circuit Breaker pattern is a pattern that is allowing, disallowing, or partially allowing throughput due to high traffic to overwhelmed service. Which can help to warn users that something is wrong. It can also warn other services that it needs to either report the issue or mitigate the issue by redirecting the data temporarily to certain storage. Such a tactic is called DLQ (Dead letter queue).
SQS, and SNS are having DLQs. In case the message is not delivered, the messages are stored in DLQ and reside there till it is expired or is deleted. DLQ can be emptied in different ways automatically by application logic or manually.
EventBridge also is using SQS as DLQ, so the same applies as above mentioned.

Conclusion

There are probably many more questions that you can ask yourself when designing certain solutions. However, the above questions are giving you a foundation in how to set your mind for well-designed Serverless architecture.