Lessons Learned Building a Large Serverless Project on AWS

Published in

data.plumbers

7 min readAug 29, 2018

Serverless has gained a lot of attention lately and many predict it is on its way to dominate the market or at least contribute significantly to the area (if you are interested in a more deep discussion about what serverless is you can read this thread by Simon Swardley or this great article by Mike Roberts).

With its growth frameworks started to emerge in order to help improve the development and deployment of serverless applications and the environment around it. From the top of my head comes the Serverless Framework and Apex, but there are dozens of frameworks out there, you can check them out on this awesome list of everything related to serverless.

In this article I’m going to talk about the Serverless Framework on top of the AWS platform. I could delve into the rationale behind this setup, but the fact is that I was presented with this framework and I really liked the way it worked, as well as the fact it supported plugins, enabling the addition of extra functionalities, as well as the simple definition of resources on YAML.

After the first impressions of the framework, when things started to grow the limitations and problems started to arise on the frameworks as well as on the AWS platform.

This article is meant to those who are facing similar issues or as cautionary tale to those starting to sail in this waters. Each topic of the article is an issue I was faced with and a description of the solution. Not all topics are about problems, some are just tips on how to organize your project in a better way.

Keep your packages small

Fragment your project with modules that can be reused. Create a main module with all the basic dependencies that all other modules will need.

This comes from experience using Lambdas written in Scala, as AWS puts a limit of 50MB for a function package (zip or jar file).

You will also need to be careful of your total storage for deployments, which is 75GB. It seems to be much, but if you have a lot of functions, each one deployed dozens of times, you will reach this limit easily. One way to avoid this is by disabling the function versioning that AWS does by default.

provider:
    ...
    versionFunctions: false

Instead you should version your code. And even if you want to run different versions of a function, you should do so by using flags on the environment variables of on the trigger events.

If you deploy a lot your lambdas don’t do versioning on AWS

Keeping your packages small is also good because you shorten the cold start time of your functions. If you don’t know what a cold start is, it is basically the time your function takes to start the first time (for each concurrent execution). If you want to know more about cold starts and how to avoid them as much as possible there are a few articles about it.

Organize your YAML file

Don’t put everything at the same serverless.yml file. Organize your project in files, one for each lambda, with subdirectories in order to group subjects.

resources.yml
serverless.yml
src/
├── config/
│   ├── development.yml
│   ├── production.yml
└── functions/
    ├── fn-a.yml
    ├── fn-b.yml
    └── subject-a/
        ├── fn-c.yml
        └── fn-d.yml

The core file continues to be the serverless.yml, but now the file is more readable and your functions are nicely organized into directories.

You will also want to have different configuration files, one for each environment where you are going to deploy your functions.

provider:
 name: aws
 runtime: java8
 stage: ${opt:stage, ‘dev’}
 environment:  # Service wide environment variables
    stage: ${self:provider.stage}
    service_name: ${self:service}
    bucket: ${file(./src/serverless/config/${self:provider.stage}.yml):env.bucket}

You can then use the stage variable to dynamically reference parameters from the configuration file. At deployment time the right configuration file will be loaded, as well as his parameters.

Split things to grow

AWS has a lot of limits that you need to be careful of. Two of those I found out when my project reached a certain size:

Fortunately there’s a plugin to fix each of these limitations. I highly suggest that you use it from the beginning of your project, because if you start using it later on your project you will need to delete/recreate many things in order to move them to nested stacks.

As the documentation states some resources such as DynamoDB tables and Kinesis Streams will need to be deleted before you can move them to another stack.

The plugins are these:

serverless-plugin-split-stacks
serverless-plugin-custom-roles

The first one will split your resources within your stack into nested stacks. You can use the default behavior to split per Lambda or per Type. Or you can create your own custom migration by creating a stacks-map.js file at the root of your project:

const stacksMap = require('serverless-plugin-split-stacks').stacksMap;

stacksMap['AWS::IAM::Role'] = { destination: 'Roles' };
stacksMap['AWS::DynamoDB::Table'] = { destination: 'DynamoTables' };
stacksMap['AWS::SNS::Topic'] = { destination: 'Topics' };

The example above will put all IAM roles into a specific nested stack, all DynamoDB tables into another and all SNS topics in yet another stack.

If your project grows larger, splitting resources by type might not be enough. In this case you can start splitting them by subject, like this:

const ServerlessPluginSplitStacks = require('serverless-plugin-split-stacks');

const stacksMap = ServerlessPluginSplitStacks.stacksMap;

...

ServerlessPluginSplitStacks.resolveMigration = function (resource, logicalId, serverless) {
    if (logicalId.startsWith("SubjectOne")) {
        return { destination: 'SubjectOne' };
    }

    // Fallback to default:
    return this.stacksMap[resource.Type];
};

Now, all resources whose logical name starts with SubjectOne will be placed into a specific stack.

One thing I noticed when putting all Kinesis streams into a separate nested stack, is that when you get around 32 streams subscribing to new streams starts to result in “Internal Failure” errors without further explanation. The remedy is to put the streams in different stacks.

You also need to be aware that CloudFormation has a limit of 200 nested stacks. So, if your project grows even larger you will need to create a new project.

The second plugin will create one IAM role per lambda function, instead of one role per project. This will create more resources on your stack, but in the other hand it will avoid hitting the 10,240 characters limit for Role policies.

RequestLimitExceeded

If you have a project with enough lambdas you might have encountered this error when deploying. And you are not alone.

Your request has been throttled by EC2, please make sure you have enough API rate limit. EC2 Error Code: RequestLimitExceeded. EC2 Error Message: Request limit exceeded. (Service: AWSLambda; Status Code: 400; Error Code: InvalidParameterValueException; Request ID: xxxx).

Personally, I’ve faced this issue while trying to update the environment variables of all lambdas (around 70). I first tried to use the serverless-dependson-plugin in order to add a DependsOn parameter to each lambda function, forcing CloudFormation to deploy all lambdas sequentially.

This did not work, causing circular errors, probably because of the nested stacks. If you don’t use nested stacks, the plugin might work for you.

What solved the issue was undoing all changes to the environment of the lambdas and applying the changes in small batches (~10 lambdas at a time).

This issue is still around after more than two years with no solution from AWS, as it is an issue with CloudFormation itself.

The stack might even be stuck at UPDATE_ROLLBACK_FAILED, requiring the user to call continue-update-rollback in order to retry the rollback, sometimes the user will need to call it several times until it succeeds.

Caching and Reuse

Although Lambdas are meant to be ephemeral, you can have some sort of in-memory cache by using global variables (outside the handler method).

You need to do this carefully because it might lead to undesired side effects. That’s why it is important to know how the Lambda lifecylce works in order to use it the right way.

From my own experience, you can cache HTTP clients or service clients for AWS services without problems. Database connections on the other hand might not work correctly, and it is not advised to call a database within Lambdas at all.

Learn from Someone Else’s Mistakes

This is probably something everyone should know, not just in computer science, but in life.

Before starting to bang the head against the wall, or using a hammer for everything that seems like a nail, read the documentation and try to find some guides on best practices and use cases.

To make things easier, here’s list:

AWS Lambda Documentation
Serverless Framework Documentation
Best Practices for Working with AWS Lambda Functions (AWS)
Serverless Architectures with AWS Lambda: Overview and Best Practices (AWS, 2017)
Serverless Best Practices (Paul Johnston, 2018)
Case studies of AWS serverless apps in production (Paul Swail, 2018)

Conclusion

The serverless ecosystem is evolving constantly, so don’t be afraid to make changes. The Serverless Framework has a very active community, they make a new release every few weeks, so always try to run the latest version as much as possible.

And your architecture will also need to evolve constantly. Always monitor your functions and if something is not working right, rethink the way you are doing things. If necessary, rewrite everything.