Single vs. Multi-Tenant: Having our cake and eating it too?

Published in

Engineering at Alfa

7 min readNov 14, 2023

*Photo by Karolina Grabowska:* *https://www.pexels.com/photo/fruit-jelly-hearts-on-pink-background-4016528/*

Historically, managed service providers (MSPs) have often been contracted by finance companies to operate mission-critical enterprise software using an outsourcing model. This model is built on custom infrastructure setup and extensive training of dedicated teams to operate an unfamiliar system. We often found that, because of the custom nature of the engagement, these providers didn’t invest in automation for setup, monitoring or responding to problems. The end result was an outsourced service which was expensive, and suffered from long lead times for changes, and therefore offered a low overall quality of service and poor value for money. The main benefit was the customised single-tenant approach to deployment.

When we first designed and architected Alfa Cloud, we set out to offer our customers all of the benefits of a single-tenant Software-as-a-Service (SaaS) product — physical data segregation, consistent performance and upgrade flexibility — whilst avoiding legacy MSP pitfalls by leveraging multi-tenant techniques for deployment and operation.

A typical modern enterprise uses many SaaS products for a wide variety of use cases, from payroll to travel to source code management. The vast majority of these will be multi-tenant applications where there is no customer control over releases, and storage is shared with the provider’s other customers. Data segregation is generally applied, through user and group-based authorisation in code, in the application tier. This consumption model is well suited to single-purpose applications that do not store end-customer data and where absolute resilience is not a critical part of the offering. It offers a usage-based pricing model which is generally easy to predict for customers, and allows the service provider to take advantage of the elasticity of modern public cloud platforms to simplify operation and reduce costs by scaling to overall demand across a fleet of shared servers.

A single-tenant SaaS product offers many benefits to the customer, but presents significant challenges to the provider of the service. To get benefits of scalability and avoid a combinatorial explosion, each isolated tenant must be repeatedly deployed in exactly the same way, whilst still allowing for differences in scaling parameters according to expected load. Coordinated infrastructure maintenance, monitoring and response must be highly automated to allow for growth without compromising quality.

One definition, many deployments

The requirement for repeatable deployments has been met by the emergence of Infrastructure-as-Code (IaC) products such as Hashicorp’s Terraform and cloud-specific variants such as AWS CloudFormation. These products are all based on two things:

A code-based definition of the set of resources which make up a service and how these resources interact to provide the service. For example, an internet-facing load balancer and the SSL certificate which is attached to it.
A set of parameters defined in the code-based definition and provided to the infrastructure-as-code platform at deployment time.

These parameters allow for strictly controlled variations in the deployed infrastructure to support different customer requirements, without deviating from the blueprint defined in the code-based definition. For example, parameters could control the minimum and maximum number of servers running in an environment, or the IP ranges from which users can connect. These are expected to vary between customers and even between production and sub-production environments. The deployed infrastructure will then use these values when creating the required resources.

The following diagram illustrates a fundamental problem with using different regions, accounts and VPCs for single-tenant customers, each of which is deployed using different configurations. In this example, there are four customers spread across two regions, each of which has a production and sub-production account containing a set of environments inside dedicated VPCs. The infrastructure is managed by the Cloud Formation service in each region, where configuration is provided from object storage but is spread across lots of accounts without any central coordination.

A diagram showing the challenges with deploying different configurations to many different deployments across many AWS accounts.

Bringing it all together

If we want to achieve our aim of a unified single-tenant product, we need to implement some way to audit, persist and deploy changes to a tenant’s configuration and to introduce central management of the code definitions, known as templates. Changes to the templates and/or configuration are the triggers for CloudFormation to enact infrastructure changes automatically. Such changes might be made when patching compute resources or databases across one or many environments, or controlling which Alfa Systems features are deployed in specific environments, or even upgrading Alfa Systems to the latest version. Therefore it’s vital that we have complete confidence in the changes being made, and that we are in control of the release of any changes to our isolated tenants.

These requirements are met through a custom orchestration platform which lies at the centre of Alfa Cloud. This doesn’t replace the use of a standard infrastructure-as-code engine or take any part in runtime operation of the service. It exists to manage configuration sets and templates, pushing these to the CloudFormation service in the appropriate region and monitoring the changes being made. It also provides us with a single pane of glass through which we can view patch levels in every tenant and abstract over features to ensure container and infrastructure configuration changes are managed consistently and correctly.

The Alfa Cloud platform is shown in the diagram below, along with the UI- and API-based integration options for use within the Alfa internal network.

A diagram showing the way Alfa Cloud tackles issues with deploying different configurations to many different deployments across many AWS accounts.

An example of a platform feature requiring coordination of container and infrastructure configuration is the use of Amazon’s Simple Queue Service (SQS). This requires extensive container configuration in Alfa Systems, an S3 bucket per queue, IAM roles and networking rules to limit access to allowed IP ranges, and of course configuration of the SQS service itself.

In our tooling we can present admin users with a simple UI, allowing them to configure the feature as needed. Those inputs are then persisted and passed to our templates at deployment time. The screenshot below shows the configuration for a single SQS queue with an IAM key and secret to be created during deployment. These credentials can then be accessed via our customer portal, without being manually retrieved, and passed on to the customer.

A screenshot of the SQS configuration within the Alfa Cloud control plane.

Through this platform, we have also incorporated the use of CloudFormation Change Sets. Change Sets are a mechanism for simulating the effects of a set of changes (templates and parameters) without actually applying the changes. Using Change Sets in our deployment pipelines allows us to verify changes automatically ahead of time, and forces an authorised user to confirm that changes to stateful resources, such as databases, can proceed. By incorporating this level of control into our change management process through technical guard rails, we ensure that change is well understood and deployments are, above all else, predictable across tenants.

The platform also supports the bulk rollout of a change to multiple environments at once via a wizard-style selection. This allows us to configure all aspects of the deployment along with the set of tenants we wish to deploy to. This feature is used routinely for server patching, and can be used in an emergency to quickly roll out changes to non-production and then production environments in batches.

A screenshot of the deployment audit screen within the Alfa Cloud control plane.

Conclusion

We believe that our approach to tenant management offers us the best balance of single- and multi-tenant approaches. Our customers benefit from knowing that their infrastructure and data are isolated from other customers, helping them to simplify compliance with auditors and regulators and control their upgrade cadence. By leveraging these multi-tenant techniques, Alfa can efficiently manage a large number of tenants through a common interface, the result of which is a higher quality of service when compared to traditional single-tenant MSPs. All of this is made possible through the use of Infrastructure-as-Code, combined with our own investment in a best-of-breed platform focused on our specific business needs.

We used AWS’s CloudFormation and an application built using the same foundation as our Alfa Systems enterprise platform — this gave us a shortcut to building APIs, authentication and a UI design system. However, the same can be done using Terraform (or an open-source alternative such as Pulumi) or even a UI directly invoking AWS APIs via Lambdas.

The specific technologies aren’t as important as the architecture, processes and design to get the best of both multi- and single-tenant SaaS models.

Single vs. Multi-Tenant: Having our cake and eating it too?

One definition, many deployments

Bringing it all together

Conclusion

Written by Alex Barnes