Our Multi-tenancy Journey with Postgres Schemas and Apartment

Published in

Influitive Crafters

6 min readNov 29, 2016

Multi-tenancy is a hot topic these days. My definition of a multi-tenant application is a software system that serves multiple customers, each with their own isolated view of their own data in that system. Each customer and their data is often referred to as a tenant, hence multi-tenancy.

In an earlier blog post on sharding multi-tenant applications, Craig dives into the type of tenancy that naturally occurs in any system that has to support multiple customer accounts (or in his example, a store). In his example, all records under a particular tenant get a foreign key tenant_id which logically associates the rows in other tables to that tenant. Isolation occurs at query level where all products (for instance) would be scoped using the tenant_id for the particular store you’re interested in.

When I joined Influitive, they had already gone the multi-tenancy route, but with a different isolation level using PostgreSQL schemas. This uses a slightly different mechanism of multi-tenancy, where tables no longer store references to their tenant. Instead, when we add a new tenant, we create a new Postgres schema, and create (duplicate) all tables inside of that schema that store a customer’s data. Queries being made for a tenant set the schema search_path so that all data fetched for that request are now implicitly scoped to that tenant.

The diagram ends up looking like this:

To that end, we’re going to dive into our experience of using Postgres schemas for multi-tenancy (and the gem we wrote, Apartment) and some of the learnings we’ve taken away from this experience.

In the beginning, after finalizing the Apartment interface, this strategy provided a HUGE benefit to us. We could bring on new customers, simultaneously add new features and not worry about data isolation. I’ve heard countless stories of startups where data leaked between customers and this was simply not something we had to worry about.

The real problems came as we attempted to scale:

Migrations

As our customer base grew to 100+ customers and as our application grew to 100+ tables, we started to notice things slowing down. I don’t have precise numbers to back this up, but we found a direct correlation between the number of Postgres schemas, the size of the tables in those schemas being altered while running migrations and the time it took to complete the migration. The more schemas/larger the tables, the longer migrations took.

It’s well known that index changes to large tables can take time and even cause table locking. Ideally, for things like column additions, you’d get constant O(1) performance. Under a separate schemas approach to tenancy, you now get O(N) performance where N=# tenants. Now when you get into less predictable changes like index additions, it’s a hell of a lot worse. I don’t know exactly how to represent that in Big O, but I think it would look something like O(WTFxN).

Migrations started to become the bane of our existence, which meant that deploying started to become a hassle. No one wants friction in their deployment process, especially as we’re attempting to deploy daily or more frequently.

Database Resources

I’m not an expert on the inner workings of PostgreSQL, but the amount that we’ve needed to scale out our main production database when using this tenancy strategy seems to dramatically outpace that on any of our other services that use column scoping for tenancy. My guess is there’s an upper limit — if not a hard one, then at least a soft, recommended one — on the number of tables/indexes etc that you store within one postgres db. We’re running an RDS r3.4xl which costs about $3k monthly, to house a database that should otherwise be able to live on something much smaller. We haven’t exactly dug into this, but I’m quite sure the sheer number of tables we have is an issue.

Client Memory Bloat

This one is relevant to Ruby and more specifically ActiveRecord (but possibly any other library with a similar implementation) Although some fixes have been made, the root cause remains largely unfixed. ActiveRecord, on connecting to the database, iterates through all tables and stores metadata about the columns in order to properly map Postgres data types to Ruby ones. Unfortunately the way this happens is by iterating through all tables in all schemas, then caching what it finds. This unnecessarily bloats the running client given that all the types are the exact same across all tenants, but we’re unable to configure ActiveRecord to just map through a single schema.

Right now, the moment any of our ruby processes connect to the db, it grows to about 500mb in memory right off the bat. Other services with similar data amounts, but not using schema based tenanting do NOT have this problem. This of course will continue to get worse with each customer (tenant/schema) we add to the system.

Record Identification

One major drawback of schemas as tenants is that your sequence generators will reside independently in each tenant. This means, if you have a tenanted user table, you now have X number of users identified with id=1 (and for every other generated sequence id assuming uniform distribution). If you ever try to join across these or report globally on all of this, you’re going to get some conflict. Furthermore, one could actually get some permissions issues if you ever copy this data to other systems without scoping things to a tenant id.

What To Do

The end result of the above mentioned problems have caused us to mostly abandon our separate schemas approach to multi-tenancy. For all services we build going forward, we use a more traditional column scoped approach and have written our own wrappers that effectively mimic the per-request tenanting approach that Apartment gave us. We haven’t open sourced anything as the implementations are pretty use-case specific, but there’s no shortage of documentation on how to achieve this type of data isolation with your ORM of choice.

I’d like to close with a few recommendations from our experience that may help those who have already gone the route we went based on changes we’ve made.

Choose the right tool for the right job

Initially, we blindly threw all customer data into their separate schemas’ tables with no thought for the type of data they were storing. If you ever catch yourself talking about things like “events”, “logs”, “transactions” etc, (namely, anything that suggests high volume writes) consider a more appropriate tool such as a distributed database like Citus, Cassandra etc, or an event log like Kafka with projections. This can solve a lot of the migration/index headaches you might incur

Create tenants from a reputable source

Currently the Apartment gem uses the rails schema.rb for generating new tenants. This is a mistake. This file changes locally when running migrations, but it represents exactly what that developer has in their local database. If a feature branch (for example) has experimental database columns added, those can end up sneaking out to production if you inadvertently commit the schema.rb changes to your mainline development branch. (It’s happened to us more than we’d like to admit). This then adds those experimental columns to the next tenant that’s created. You now have a heterogeneous schema across tenants! This will break migrations if that experimental column becomes real on your next deploy because the add_column call will fail on those tenants since the column already exists.

Use UUIDs

As mentioned above, tenanting with sequence ids means that you don’t have globally unique identifiers for objects in your system. This is particularly troublesome if you start aggregating data across tenants (for reporting for instance), or attempt to join across tenants and only have a sequence id to rely on for identification. To that end, we’ve mostly added a uuid column to all of our tables, and now just use sequence ids for cursor based pagination.

Conclusion

I hope the above lessons will give you some insight into your next design for a multi-tenant application. At this point, I cannot recommend going the Postgres schema approach given the headaches we’ve seen above. I hope this post helps you all avoid some of the pitfalls that we’ve encountered and are still digging ourselves out of.