We’ve got too many DynamoDB tables

Simon Stevens
AppLearn Engineering
3 min readNov 3, 2021
const totalDynamoDbTables = tenants.count * microservices.count * environments.count;

With our product being Software as a Service, tenant isolation is an important aspect of our architecture. We use silos to provide dedicated resources to each tenant, which allows us to meet varied compliance requirements, avoid impact from busy tenants, or “noisy neighbours”, and track costs at a per-tenant level.

Silo isolation comes at a cost, however. While sometimes this is literal, due to the inherent inefficiencies of such an approach, in our case silo isolation also suffers from scaling and onboarding automation issues.

Releasing amazing new features like Tooltip Sequencing and Discovery Analytics drives our ever-expanding array of microservices, and we have many in our roadmap. With each microservice responsible for its own data and the natural growth of tenants, the growth of tenant resources like DynamoDB and S3 buckets is unsustainable.

While there is nothing physically stopping us from growing resources forever (soft limits can be extended and hard limits can be combated with per-tenant accounts) it becomes a burden at scale.

As a Software Architect, I anticipate future problems like this and work within engineering to develop plans well ahead of time, ensuring that the best solutions can be developed.

In this case there were a variety of solutions we felt were viable, but we wanted to hear about the experiences of other companies in our situation. This made meeting with our AWS Solution Architect and the AWS SaaS Factory team to discuss our options invaluable, and gave us confidence that Hybrid Tenant Isolation was the right solution for our needs.

A hybrid model will see the bulk of our tenants’ resources pooled, with provision for siloing resources for tenants that have specific compliance requirements, or are particularly noisy. This approach allows us to get some of the best of both worlds. The cost efficiency of having most resources pooled is great, and brings further benefits in the form of business logic and permissions.

Consistent business logic

For pooled resources, resource types lend themselves to different methods of defining tenant boundaries. Silos provide their own boundaries so there is no need to do so within business logic.

Embracing the boundary definition chosen for pools within silos keeps access patterns consistent and reduces engineering complexity. It is unlikely to cause problems, and if nothing else, the cognitive load of multiple access patterns for every resource is something AppLearn engineers thank us for avoiding.

For example, adding tenantId attributes to a table is unnecessary when a silo dictates an entirely separate table, but it certainly does no harm and allows business logic to be unchanged irrespective of which resource is used.

Productive permissions

Fine-grained access control allows us to enforce isolation, ensuring that we will never unintentionally cross tenant boundaries. Being able to scope down permissions to the business logic that defines tenant boundaries is critical to prevent incidents, but this must consider the realities of engineering.

For example, an engineer who unintentionally crosses tenant boundaries is not at fault. Instead, this is a systemic failure, as the engineer should never have been able to make this mistake. We must write shared logic that scopes access to resources, ensuring that the mistake of a single engineer can not result in crossing tenant boundaries.

Shared access control also gives us the opportunity to place these algorithms under a microscope, exposing them to more thorough peer-review and periodic auditing.

In this case, placing all of your eggs in one basket is best practice.

Combining pooled storage with selective silos allows us to manage less and benefit from economies of scale. We’re really excited to adopt this approach and reap its rewards.

--

--