Serverless enterprise-grade multi-tenancy using AWS

11 min readFeb 27, 2018

Motivation

Most enterprise-grade cloud application must fulfill certain requirements, like data privacy and protection, security, high availability and so on. Without these qualities, a cloud application may not be a viable choice for production use.

In this article we focus on the quality multi-tenancy. Within the ecosystem of SAP, we are used to a very convenient way of handling multi-tenancy. Basically, the runtime and the programming language ABAP allows writing applications with multi-tenancy support without any additional effort. SAP is using this approach since many years for ABAP-based business applications, on-premises and cloud.

For building cloud applications SAP offers the SAP Cloud Platform (SAP CP) which is based on Cloud Foundry. Following the recommended programming model, an application has enterprise-grade multi-tenancy support without any additional effort. But we, two architects working for SAP, see that customers and partner have built already apps and services using other cloud providers like AWS, GCP or Azure, too. Because of that, we want to propose a model for building (serverless) cloud applications on AWS with enterprise-grade multi-tenancy support.

Disclaimer: This post describes our personal experience and our personal views. The views expressed are our own and do not necessarily represent the views of SAP SE. This post is not an official guidance or recommendation of SAP SE.

Info: Example coding can be found at the end of this article.

Introduction

This post proposes an enterprise-grade multi-tenancy concept for applications running on AWS using only AWS serverless services.

First, it’s important to define multi-tenancy and differentiate it from a user concept. According to Wikipedia “A tenant is a group of users who share a common access with specific privileges to the software instance”. This means for a company like SAP that it offers the same software instance to multiple customers. Each of these customers have their own user base, e.g. their employees.

But what are the requirements for an application with an enterprise-grade multi-tenancy support? A couple of SAP’s internal software standards define requirements for multi-tenancy which can be boiled down to the following list:

All data belonging to a specific tenant must not be visible to or accessible by any other tenant.
Identities, identity management, and authentication shall be maintainable and configurable per tenant.
Password and other security policies shall be configurable per tenant.
An application should use the capabilities provided by the platform if possible.

This post proposes a concept which satisfies these requirements using only AWS services.

Multi-tenancy concepts

There are already several multi-tenancy architecture design patterns available. Our concept can be used to apply any of these. But let us briefly look into three common patterns. To satisfy the requirements mentioned above, the data of the tenants must be separated. Three typical ways of data separation for databases are:

Shared-nothing, i.e. a separate database instance for each tenant.
Schema-separation, i.e. a separate database schema for each tenant.
Shared-everything (aka row-based separation), i.e. using the same database tables for all tenants with a tenant separator column.

If you would like to learn more details about these concepts, we recommend reading this article.

Multi-tenancy @ AWS

For explaining the concept consider a simple service that returns a list of pets in a pet store. The data is stored in a DynamoDB table. The website is hosted using S3’s static website hosting. Your architecture looks typically like the following:

Your Pet Store’s website assets are stored in S3. S3 serves it as a static website (1). Your API is described in API Gateway (2). The endpoint /pets invokes the Lambda function listAllPets (3). This function just fetches data from DynamoDB (4).

Thanks to the serverless AWS services your API scales to millions of requests without any effort. However, right now, your API can be used by anyone without any authorization or authentication. Because of that, you would like to add a simple user management and an authentication flow. Users shall be authenticated before they can access the pets list.

Fortunately, AWS offers a serverless service for that as well. AWS Cognito allows to manage users and can handle the authentication flow. Besides, it provides a customizable login page:

The great benefit is that Cognito offers everything related to the authentication flow: signup, login, password policies, user attributes, multi-factor authentication, phone number verification, email verification, and so. None of these have to be implemented by us for our small pet store.

The updated Architecture is shown below.

The user retrieves the static website from S3 (1). The client-side JavaScript checks whether the user is logged in, e.g. by checking whether a token is stored in a cookie. If the user is not authenticated, it redirects the user to the Cognito’s login page: https://<domain>.auth.<region>.amazoncognito.com/login (2). If the user is successfully authenticated, Cognito redirects the user to our pet store API /auth/return providing a one-time password code. This password code is used to obtain an OAuth Id Token from Cognito. This Id token is set as a cookie and the client-side JavaScript provides it for upcoming requests.

Until now, the user is authenticated but he has not retrieved the list of pets, yet. Therefore, he calls the /pets endpoints and adds the Id-Token in the authorization header (3). API Gateway checks in conjunction with Cognito whether the id token is valid (4). If the Id-Token is valid, it invokes the Lambda function /listAllPets (5) which retrieves the data from DynamoDB (6).

Let us briefly recap what we have achieved so far. We built a serverless API with a website hosted by S3. The API is protected against unauthorized access. Users can sign up, reset their password and login by using Cognito’s UI. What we have built so far, is a very typical serverless architecture. In production, you may add additional services like AWS CloudFront as a CDN or AWS Shield for DDoS mitigation.

So far, our API supports only one pet store. It is multi-user but not yet multi-tenant. You could add additional pet stores by adding the store in the URL, like /{petstore}/pets. But as described earlier for an enterprise-grade multi-tenancy, we need better capabilities. The user management for each tenant must be separated. The data shall be separated and so on. Because of that, we need to extend our existing architecture for supporting multi-tenancy.

We need two important enhancements to support multi-tenancy:

Instead of using a single Cognito User pool, we add one Cognito User pool per tenant. The user pool of a tenant can be connected to a corporate identity provider (like SAP IdP) or a social login provider like Facebook or Google. (2, 5)
We add a custom authorizer Lambda function (4). Unfortunately, API Gateway can verify JWT tokens against a single Cognito User pool but not against multiple. But, we want to offer a single API, like https://api.example.org/pertstore/. The tenant context is given by the JWT token. Because of that, we need to implement a simple custom authorizer. The custom authorizer checks the token, extracts the tenant-specific context and passes this information to AWS Lambda (6).

This extension allows a tenant-specific authorization with a tenant-specific user pool.

Let us briefly recap again what we have achieved so far. We built a serverless API for our pet store. The API allows only authorized access using a JWT token. The JWT token is issued by a tenant-specific Cognito User Pool. Each tenant has its own user pool so that each tenant manages its own user base, security policies and so on. Additionally, a tenant can connect its user pool to other identity providers, like SAP Cloud Identity. A custom authorizer function checks the validity of a token, extracts the tenant context and passes the information to AWS Lambda.

The Lambda function getAllPets is invoked by the API Gateway and should return the pets of the pet store. However, the tenant separation is not enforced. The tenant information is passed to the function but the developer must take care of not mixing up the tenant data. That is not enterprise-grade multi-tenancy support, yet.

Therefore, we need a further extension of our model, to enforce the strict separation of tenants. Ideally, we invoke the lambda function getAllPets with an IAM role that grants only access to the tenant-specific data.

Unfortunately, this is not possible because the Lambda execution role is fixed and set during the deployment of the function. We could deploy the function separately for every tenant, but this doesn’t scale well and is not truly a multi-tenant application. First, the cold start problem of Lambda function applies to every function for every tenant, so that we will see a much higher average latency due to the high number of cold start invocations. Second, the deployment and operations gets significantly more complex. In fact, we need to deploy the full stack for every tenant. Our conclusion is, we cannot set the execution role of a Lambda function to a tenant-specific role.

Nevertheless, we stick to the goal of a tenant separation by enforcing it using IAM roles. Therefore, we need two different kinds of roles: First, the execution role of a lambda function. This role is not allowed to access any tenant-specific data. But it can write logs and so on. Second, a tenant-specific role which grants access to the tenant-specific data. Let us assume we use a single DynamoDB table for all tenants. The HashKey is the tenant id. The IAM role of tenant abc grants access to all rows which have abc set as tenant id. This is specified by the condition leadingKeys.

An example tenant specific IAM policy is shown below:

We want to abstract all the inconvenience from the developer. Therefore, we have built a small wrapper for the AWS SDK. Instead of calling const db = new AWS.DynamoDB() the developer gets an instance by using the following code snippet:

// get a tenant-specific dynamo db client
// db can be used like a regular dynamo client
ddbfactory.get(tenant).then(db => db.query(params))

The tenant information is forwarded to the DynamoDB client factory. The client factory returns a db instance as a promise. The db instance has only access to the data of the specific tenant. Because of that, it is important that the db instance is not cached across multiple requests. The db instance must only be used within a single request and a new instance must be fetched for each incoming request. Else, the tenant separation might be violated. The factory caches the tenant specific instances internally.

The DynamoDB client factory retrieves credentials from the AWS Simple Token Service (STS) for the tenant-specific role. Next, it creates a DynamoDB client with the tenant-specific credentials. The factory can cache these clients so that the STS service is not called for every request. The flow that is executed internally is shown in the figure below. Please note, that from a developer point of view the step 2 is invisible. The developer just requests the DynamoDB client from the factory and uses it like a regular DynamoDB client.

So, let us recap one more time. Our API supports tenant-specific authorization as described earlier. The data is separated on DynamoDB by using the tenant id as the hash key. A tenant-specific role grants access to the data of a tenant. The execution role of the lambda function has no permissions to access any tenant-specific data. But, by using the DynamoDB client factory, a tenant-specific client is created. The client is authorized to access the data of that particular tenant. From a developer’s point of view, the tenant handling is transparent because the client factory takes care of getting tenant-specific credentials and granting the access to the tenant data.

In this post we only looked at DynamoDB tables and a row-based separation. But the client factory could be used to set the table name, too. This allows a separation on the table level. Obviously, the proposed architecture is not limited to DynamoDB. For instance, S3 allows granting permissions based on the prefix of an object name, i.e. a tenant can only access a certain folder. Furthermore, the usage of AWS Cognito is optional, too. The custom authorizer could verify a token against any other identity service. We tested this, by connecting directly to SAP’s corporate identity provider.

Tenant on-boarding

Finally, let’s take a look into the tenant on-boarding process. We created a AWS CloudFormation template that handles the tenant on-boarding. In particular it creates the tenant-specific role with permissions for accessing the tenant specific data. Additionally, a Cognito User Pool is created and a tenant.json configuration file in S3 is updated. All tenant aware Lambda functions reload the configuration regularly. By using the CloudFormation template the set-up process is completely automated and therefore the likelihood of errors is reduced.

Closing Remarks

The presented architecture can be used to achieve enterprise-grade multi-tenancy support on AWS. The tenant separation is almost invisible to the developer and the IAM policies enforce the data separation. Summarized, we believe from an architectural point of view that AWS can be used for building multi-tenant (serverless) applications.

I would like to thank my colleagues at SAP, especially Carsten Ziegler, for reviewing this blog posts and providing feedback.

Appendix: Code

The appendix will briefly explain the relevant code snippets. The first snippet shows the custom authorizer lambda function. The handle function (Line 125–147) verifies the JWT token first (expiration date, valid issuer, signature) and generates a policy which allows executing the API if all tests are passed. Finally, the tenant information and the policy are returned to API Gateway. The function shall only check whether the tenant is valid and whether the JWT token is valid. It does not check any authorization constraints. These must be checked by the invoked lambda function.

The following snippet shows the lambda function that is invoked by the API. In line 28 it gets a tenant-specific DynamoDB client and executes a query. Even if the developer would not set the tenant in line 23 correctly, the lambda function could not get any data from another tenant. The client factory returns a client that can only be used to access the data of this particular client. But we work on injecting the tenant filter automatically so that the developer must not think about this.

The next snippet shows a very basic implementation of the DynamoDB client factory.

Because we use the login UI provided by Amazon Cognito a function that gets the issued code and returns a JWT token is needed. The snippet of that function is shown below. A user access the service page, say example.org . The coding checks whether the user is logged in. If not, the user is redirected to Amazon Cognitos login page. The login flow returns a one-time code. This code is then used by the following function to get a valid JWT token.