Multi-tenancy architecture of a relational database

Published in

CyberArk Engineering

4 min readMay 20, 2020

A while back I designed a multi-tenancy system. My initial thought was to share the database resources (single database table, shared connection pool) over the tenants as much as possible, but I quickly realized there are no free meals. Going with the ‘maximum resources sharing approach’ may produce new issues, like data leakage over tenants.

In this post I’d like to propose a method to evaluate the Relational Database appropriate architecture that matches your system’s needs.

The method consists of the following three steps:

1. List of criteria that can affect the architecture selection

2. Some architecture alternatives

3. Decision table — system criteria vs. architecture alternatives

Aspects to consider

First, map your system needs with the criteria below, later a decision table can help you decide which architecture to use. For example, if thousands of tenants are expected, then only a sub-set of the architectures is relevant.

1. High level estimation of total number of tenants is expected — (few, many, unlimited)

2. Tenants Isolation

Data isolation — how far are you willing to go (in terms of effort and cost) to provide data isolation between tenants?
Work load isolation — is there any tolerance for workload impact between tenants? The answer is probably yes

3. Cost effective solution

4. Operational complexity, the following is likely relevant to your system:

Automation for new tenant provisioning & database version upgrade
Database user permission management
Database backup and restore

5. Operations on a single tenant

Restore of single tenant data, due to data loss
Porting tenant between environments

6. Regulation support e.g. compliance with GDPR — you may be required to remove a customer’s data from the database and from old database backups

Logical Solution Architectures

In this section I’ll discuss three possible Logical solution architectures. A Logical architecture, is a database agnostic architecture that can be mapped, in a good approximation, to any relational database vendor.

This analysis was originally made for MySQL, but if you’re using another vendor, then map the MySQL ‘Schema’ term (which repeats itself below) to yours. For example, if your database is MSSQL then the MySQL ‘Schema’ term is mapped to the MSSQL ‘Database’ term.

Note: There are additional possible architectures, which are database provider specific, such as PostgreSQL Row Level Security, these approaches are not covered in this post.

The two main questions to answer regarding the database are:

How to organize the tenant’s data in the database
How the application layer manages the database connection

There is a tight relation between the database and the application layer, the following are assumptions about the application layer:

The application layer is stateless, any application instance can serve any tenant request.
The application connects to the database using a connection pool.

Database schema per tenant, connection pool for each schema

Architecture Main Concept:

Any application instance has a connection pool per schema
Routing data source is used to route the request to the logged-in user schema

Schema per tenant, having a shared connection pool

Main Concept:

Any application instance has a single database connection pool
Infrastructure code sets the active schema (e.g. in MySQL: set schema <user tenant>), prior to user interaction with the database

Single database, discriminator column for data partitioning

Main Concept:

Any application instance has a single database connection pool
Discriminator column, added to any row, to associate with the owned tenant

Comparison table

This table compares the three possible architectures with the list of criteria proposed at the beginning of the post. By mapping your system specifications (e.g. hundreds of tenants are expected), this table can instruct you on which alternatives are feasible.

Notes:

1. For some of the ORM solutions (e.g. Hibernate filters), the SQL where condition (“and tenant_id=..”) can be added automatically but for native queries the filter should be managed manually

2. Sensitive data can be encrypted using per tenant key. In this case, if data is leaking, the data is meaningless for other tenants.
There’s a great post on this topic: https://medium.com/cyberark-engineering/make-client-side-encrypted-data-searchable-by-saas-backend-b11344804ed1

3. Operational complexity is about implementing version upgrade, new tenant provisioning, database permission management and high availability support

4. Dedicated deployment, with few tenants, can be managed for rolling out releases to subset of the population

5. With assumption of many tenants and long duration for database upgrade per tenant (e.g. new Index on a massive table)

Wrapping Up

In this post I’ve introduced a method that can help you decide on the RDB architecture in a multi- tenancy system , it’s important to note that there is no correct architecture, and the decision on which architecture to use depends on your system’s needs. I suggest referring to the Salesforce and Atlassian approaches — two different companies, each company uses a different RDB architecture for their SAAS solution.