Implementing multi-tenancy in a production system

Alejandro Capdevila
Mercadona Tech
Published in
9 min readSep 14, 2022

Introduction

Hello there! My name is Alejandro, and I’m a backend developer at Mercadona Tech. Just as I landed in this company, I joined the Supply team, a very young development team (the oldest backend developer joined just one month before me) that had the challenge of migrating one of our applications into a multi-tenancy environment.

In this article, I will present how we faced this challenge, see what tools were used and how it was implemented, as well as the conclusions we reached after the whole process.

Our goal

One of the responsibilities of the team to which I belong is the development of an application called V6, designed to help in various daily management tasks of Mercadona stores.

Initially, each store had its own systems, which meant that every time we wanted to implement the application in a new store, it involved relatively extensive work to prepare the necessary environment.

Such a task can be accomplished without problems as long as the number of deployments is small. At the time of writing, Mercadona has more than 1650 stores. It is not hard to imagine that such a colossal task is practically impossible to accomplish, or at least not without having to involve an excessive number of people in the process. In other words, we could not scale.

But there is no need to panic. Fortunately, a lot has already been invented in this life, and there are techniques designed for this situation.

Multi-tenancy

In terms of software architecture, multi-tenancymeans that a single instance of an application serves different tenants, a tenant being a group of users or other systems with the same access privilege level.

Translating this to our environment means several stores would attack the same V6 deployment or tenant. In this way, several stores would share the resources of a single tenant, greatly facilitating the deployment task.

While I will focus on the code-level details of the challenge of migrating an application to multi-tenancy, I will not forget to mention the enormous work done by our colleagues at SRE to manage and prepare the systems to deploy the stores.

Let’s get to work

Making a structural change in an application already in production is not trivial.An error would lead to the degradation of the data in our database, affecting the day-to-day work of the stores.

This forced us to be extremely cautious when applying the necessary changes. For that reason, we decided to use the parallel changes methodology relying on the use of feature flags.

First steps

But what does all this mean at the code level? What did we really have to do? Put very briefly, we had to add a new field in practically all the tables of our database (specifically, the store identifier) and make the different actions of our application act accordingly depending on which store was executing them.

That is, given an initial inventory model:

We had to move to a model that included the center:

As already detailed by my colleague Juanjo in his article, and taking advantage of the fact that we had to review the application from top to bottom, we applied the boy scout rule. In this way, we moved all the application logic in some views to actions. For more details on hexagonal architecture, I recommend this Codely course. That means that instead of having this:

We move on to this:

This case is particularly simple and does not cost too much to make the change, but other actions were much more complex, and the update was not so trivial. And it is at this point that django-constraints come into action.

Django-constraints

It is a library designed to work with Django, allowing us to add some constraints to our models and act accordingly. Since the base library does not have all the required functionality, our coworkers took it upon themselves to enhance it to cover our use cases. Without their help, it would have taken us at least twice as long.

But how does it work? Quite simply. First, we have to update our model to include the constrains we want (in our case, the center):

Now our model inherits from ConstraintedModel and we indicate the constraints we want to apply to it. Thus, when implementing the action, it would look like this:

The code is the same as when we were not working with multi-tenancy; we just had to add the extra line of constraints.

With all this, we are ready to take all our actions and pass them to multi-tenancy, right? Hold your horses; no need to run. Remember that this application was already in production, with several stores using it daily. It would be irresponsible to start transforming all the actions on the fly, as we could easily cause an error and correcting it would not be trivial.

Parallel changes

Parallel changes is a pattern designed to make changes to a contract that are backward compatible. This pattern divides the whole process into three phases.

Expand phase

The first phase is the expansion phase. In it, you live with the legacy code while implementing the new code, deciding the execution of one or the other using a feature flag.

I cannot stress enough the importance of testing and following methodologies such as TDD. Tests were our safety net. They asked if we had broken any functionality or if everything was running smoothly. They allowed us to move forward slowly but surely, feature after feature, without fear of making mistakes. In this team, nothing goes up without tests. Period.

Therefore, we duplicated all the existing tests (and, by the way, we took the opportunity to revise and extend them) to verify the new code’s correct behavior.

Once we have the tests in red covering all the different use cases, we move on to the implementation as follows:

As can be seen, the use of a feature flag provides us with a lot of robustness. In case of any error in the new code, we could quickly disable the flag so that the original code could be executed again. And all without the need for new deployments.

Migrate phase

During the expansion phase, customers (stores in our case) continued to use the legacy code, with the new implementations being totally transparent.

But once the new code is ready to be used, it is time to start migrating customers to multi-tenant.

First, a new tenant was created with V6 deployed, and a single lab store was migrated. While having a single store does not make it an actualmulti-tenant environment, it was helpful to see how all the actions and data behaved as expected.

Once we felt comfortable, the moment of truth came: to have a second store make use of that same tenant. Thanks to the efforts described above, the test was a complete success. V6 had been migrated to multi-tenant!

However, not everything went smoothly, as we encountered some problems during the migration. Due to some issues with the foreign keys of the tables, we could not migrate the data from the original databases to the tenant database. In our case, it was not a severe problem, as the application hardly saves any data. But all those who are about to embark on a similar adventure should keep this in mind.

Contract phase

Euphoric after our triumph, we were aware that our work was not finished. The whole expansion process translated into creating a vast amount of code. Practically, between testing and production code, we doubled the total number of lines of code. But that’s not a bad thing. At this point, this code had assured us that each of the steps we had been implementing worked correctly. Plus, all this “noise” was a constant reminder that we couldn’t just keep it.

Once the correct functioning of the application was verified in a multi-tenant environment, it was time to clean the house because of how we had prepared it; removing all duplicities that were no longer needed was really easy.

So the production code was as follows:

But this does not stop here. If we look at it, we see that we have had to pass the parameter center to the execute method, a parameter we should give to each child function in case there were any. This can be a real nuisance.

That is why we found a solution to avoid this problem: Using a context. Since all the calls that attack our views pass through a middleware, we can take advantage of it to store that value of the center:

Being the context:

There are several interesting points here. The first is the use of threading.local(). Basically, this variable allows us to store data at the thread level so that each request will have its values. The second is that since sending the center_idas an attribute in client requests is going to be mandatory, we throw an exception in case this has not been provided at the time we are going to make use of it.

With all this in place, our production code becomes the following:

It already looks better since we no longer have to drag the center from one side of our code to the other. But we can even go a step further. Since now all our actions will include the line with constraints(...), we can take advantage of the decorator pattern. Thus, our decorator would look like this:

This allows us to remove our dependency with django-constraintsfrom our actions and avoid duplicating the line with constraints(...)throughout our code. Finally, our action would look like this:

And as far as tests are concerned, we simply had to remove the class with the legacy tests:

Conclusions

The migration tomulti-tenant was a success despite our initial insecurity due to being such a young team (in terms of time within the team, some of us are starting to show gray hair).

The use of patterns like parallel changes , methodologies like TDD and feature flags and tools like django-constraints allowed us to attack the problem knowing that, in case of an incident, with a single click everything would work as it should.

--

--