Multi-tenant application architecture with Node.js — Express, and PostgreSQL

Larbi Sahli
6 min readJul 19, 2022

--

Recently, I have been working with node.js applications for which I had to create a multi-tenant environment. Let me share my findings with you all.

In today’s era of Software as a Service (SaaS) and cloud-based solutions, applications frequently accommodate various clients, each with their unique datasets. Securing data isolation and privacy becomes crucial within this multi-client framework. The preferred architectural model has historically been multi-tenancy, wherein a single software instance caters to multiple customers.

Multi-tenancy is an architecture wherein a single instance of a software application serves numerous clients. Each client is called a tenant, and with the help of multi-tenant architecture, we can provide each tenant with a shared instance of the infrastructure. This architecture comes with agility and operational cost savings by sharing data storage resources for all tenants instead of replicating those resources. As different tenants will be using the same instance of a service, isolating tenant data is a fundamental responsibility for Software as a Service (SaaS) providers, and with the risks being so great, it is critical to have an effective data isolation plan.

Types of multi-tenant architecture

We have an application with multiple customers (tenants), it has data in the database. The way the relationships between tenant, application and database are managed is through the multi-tenancy model. Three common data partitioning models are used in multi-tenant systems: Database, Schema, and Table.

In Database multi-tenancy, each tenant has its own database.

In Schema multi-tenancy, every tenant has its own database schema.

In Table multi-tenancy, we use a shared database instance and a schema. This allows multiple tenants of an application to isolate their data in one or more tenant-specific tables.

We will be focusing on the Table-based multi-tenancy model, where every row in every table is associated with a partitioning key (usually the tenant identifier) and restricts data access using Row Level Security (RLS for short), Consequently, we don’t need to worry about which schema or database it is connecting to. For this and other security and monitoring reasons, you should have your application connect to the database as a user other than the owner of the database objects because PostgreSQL super users and any role created with the `BYPASSRLS` attribute aren’t subject to table policies. With this in mind, we will create a CRUD database user that can only CREATE, READ, UPDATE, and DELETE.

PostgreSQL’s Multi-Version Concurrency Control (MVCC):

PostgreSQL’s Multi-Version Concurrency Control (MVCC) plays a crucial role in multitenancy due to its ability to manage concurrent access to data efficiently while ensuring data isolation and consistency.

Here’s why MVCC is important in multitenancy:

Isolation: MVCC ensures that each transaction sees a consistent snapshot of the data at the time it started, regardless of other concurrent transactions. This prevents transactions from interfering with each other, enhancing data isolation for individual tenants.

Performance: By allowing read and write operations to occur concurrently without blocking each other, MVCC improves performance in a multitenant environment. Tenants can execute their transactions without waiting for others to complete, meaning that when a write operation is ongoing, read operations are not blocked from accessing the data, leading to better overall system throughput.

Data Consistency: MVCC maintains data consistency by preventing dirty reads, non-repeatable reads, and phantom reads. Each transaction operates on a consistent snapshot of the database.

Scalability: With MVCC, PostgreSQL can scale efficiently in a multitenant environment. It reduces contention for resources by enabling concurrent transactions to proceed independently, allowing for better utilization of system resources as the number of tenants grows.

MVCC in PostgreSQL ensures that multitenant applications can achieve high levels of performance, scalability, and data consistency while maintaining strict data isolation between tenants, making it a critical feature for multitenant database systems.

We will be building a multi-tenancy e-commerce backend API where each tenant has a custom subdomain name and calls the API to request tenants’ specific data. We will use Express.js which is a node.js application framework that will provide us with a boilerplate for our demo application, PostgreSQL as the database. For this, we need a working PostgreSQL server (local/cloud).

Our data partition model for this application is a Table-based multi-tenancy model that saves the most on operational costs. Sharing a database between tenants uses hardware efficiently and reduces your infrastructure code and maintenance overhead.

Pre-requisites:

  • Basic knowledge of Node.js, Express and PostgreSQL.

Setup:

Create a new folder and initialize a node.js project by creating a package.json file with the following command in CLI.

yarn init -y

Install the following packages.

yarn add express dotenv pg concurrently && yarn add -D nodemon typescript ts-node @types/express @types/node @types/pg
  • Express: A back-end web application framework for Node.js.
  • dotenv: dotenv automatically loads environment variables from a .env file into the process.
  • pg: A non-blocking PostgreSQL client for node.js.
  • Concurrently: Run multiple commands concurrently, we will be using it to run tsc --watch and nodemon .
  • nodemon: nodemon will automatically restart our server when it detects and change in our files

Run tsc --init in the CLI to build your tsconfig.json, then add the following configuration:

Then add this to your package.json.

Database schema:

In the init.sql file, we will be creating the following tables with a Row-level Security policy:

In the table tenants, we will be storing our tenants’ information. Table products is used for storing products with a tenant identifier (tenant_id) where each tenant can own one to multiple products, a one-to-many relationship between tenants and products tables.

Indexing
Keep in mind that PostgreSQL automatically creates indexes on primary keys and unique constraints, but not on the referencing side of foreign key relationships. With this in mind, we should create an index on the tenant identifier tenant_id against the products table to enhance our database performance.

Clustered index:
In PostgreSQL, a clustered index serves to organize rows within a table’s storage pages based on the values of a specific column, such as tenant_id. This indexing strategy is particularly beneficial for minimizing I/O operations and enhancing performance, especially when frequently accessing data filtered by tenant_id.

By clustering rows based on tenant_id, all rows with the same tenant_id are physically stored together on the same storage pages. This reduces the need for multiple I/O operations when querying data for a specific tenant, as the database engine can access all relevant rows from a single page.

With fewer I/O operations required, query execution times are typically improved, leading to enhanced performance for tenant-specific operations. This approach can be particularly advantageous in multi-tenant applications where data segregation by tenant is crucial for efficiency and security.

To implement a clustered index based on tenant_id, you can utilize the CLUSTER command in PostgreSQL. This command rearranges the physical storage of table rows according to the specified index, effectively clustering rows based on the index's order.

RLS:
RLS helps enforce security and privacy requirements by ensuring that users can only view or modify data that they are authorized to access,
we created a policy that allows users to see only the rows in a table where the tenant_id column matches their own tenant_id. This ensures that users can only access data relevant to their tenant or organization.
Note that inset_config(setting_name, new_value, is_local) we set is_local to TRUE. This means that the changed setting will only apply to the current transaction, ensuring a high level of isolation and concurrency for our application to be able to handle multiple transactions or processes simultaneously without interference or data corruption.

Database Schema

Database Configuration

Create a database.ts file. This is where we'll configure the pg library to make a connection to the database and manage our tenants. We will be using set_config() a system function for temporarily setting a configuration parameter current_tenant_name and current_tenant_id.

current_tenant_name is the first parameter to get our tenant’s information using the subdomain name, tenants table has an RLS policy on the (store_name) column.

current_tenant_id is the second parameter used to have access to our products table, product table has RLS policy on the (tenant_id) column.

set_config('app.current_alias', $1, TRUE) the TRUE value (is_local) in set_config means that the setting should be local to the current transaction or session since all of our database operations are inside a transaction for security reasons aka RLS.

database.ts

Setting Up a Basic Server

Let’s say each client creates a store name (store1) we give them a subdomain name (store1.app.com) for requesting data, in our incoming request we can get the store name by using req.subdomains.

Conclusion

I hope this post has helped you understand one of the architectures used in multi-tenant applications. This implementation can help you get started with an application which you can scale in the future.

Complete code: GitHub

--

--

Larbi Sahli

Full Stack Engineer | React | Next.js | Typescript | GraphQL | PostgreSQL, I love open source and startups.