Compliant Data Access in MariaDB & MySQL: Enforcing Tenant Separation for Debug Queries

Published in

Inside Personio

8 min readJun 2, 2021

Like most Software as a Service (SaaS) companies, Personio is built on a multi-tenant architecture. While this allows scalability and sustainable cost reduction, it also comes with its own set of issues. Today we’ll be talking about the primary issue we face: ensuring tenant separation is properly enforced.

The first challenge in enforcing this separation is on the application side, where we need to make sure that data from one tenant cannot be accessed by another tenant. For that, all data entries are tagged with a unique tenant ID and always queried with this ID. This is done within the application code and is monitored to ensure tenant separation.

A second challenge is that our engineers sometimes require read access to production data in order to debug customer issues. Since Personio stores sensitive data, access is requested and granted on an as-needed basis. Our engineers were requesting access using a manual ticketing system. To automate this process we implemented a tool to grant and audit access to single tenant’s data on a request basis.

In this article, we will explain how we achieved tenant separation in MariaDB/MySQL for read-only queries, allowing our engineers to safely access production data for a single company. We’ll also run through some additional processes we’ve implemented to ensure security, auditability, and compliance.

State of the art

At Personio, we use two types of relational database management systems (RDBMS): MariaDB and PostgreSQL. Achieving tenant separation in PostgreSQL databases is relatively easy, given that the engine supports row-level security (RLS). For us, a tenant is a company identified by company_id, so we added this column to all tables that contain personally identifiable information (PII) or tenant data. From there we configured RLS to enforce the function of all queries to those tables giving a “where” clause with a single company_id.

Now the problem comes with MariaDB or MySQL databases, as those RDBMS don’t have native support for RLS. We searched for alternatives to RLS that might be available for MariaDB or MySQL but weren’t able to find much. The most advanced and helpful information we found was this 2015 blog post on row-level security in MariaDB 10, which became our starting point in implementing single-tenant, read-only queries.

(We did evaluate the option to parse the SQL queries and implement the tenant separation ourselves. While it might have seemed like a good idea initially, SQL allows many different syntaxes, which makes the parsing difficult, unreliable, and too insecure for our requirements.)

Expectations

Let’s take a simplified version of our database as an example. We have two tables:

companies → List of companies identified by an ID and with a name.
employees → List of employees identified by an ID and with a foreign key company_id.

And let’s assume we have the following data in our database:

SELECT * FROM companies;
+---------+------------+
| id      | name       |
+---------+------------+
|      1  | foo        |
|      2  | bar        |
+---------+------------+SELECT * FROM employees;
+---------+------------+
| id      | company_id |
+---------+------------+
|     1   | 1          |
|     2   | 1          |
|     3   | 1          |
|     4   | 2          |
|     5   | 2          |
+---------+------------+

The expectation is that each tenant should only be able to see the rows that belong to them. That means that a user within the company ID 1 will only see the employees with ID 1, 2, and 3.

Implementation

The idea to achieve tenant separation is to query views instead of the target tables. Those views then return the appropriate rows and filter out the ones that don’t belong to the current tenant, thanks to a function. For that, each tenant has its own database user that is created at the beginning of the workflow. That user then connects to another database (named tenant_separation, below) that only contains the views, instead of directly querying the tables.

The diagram below shows the overall process:

The workflow is:

An engineer is granted access to the company ID 1 and the user company_1 is created (if it doesn’t already exist).
The database user company_1 queries the views in the tenant_separation database.
The view queries the values from the matching Personio tables.
All the results are returned to the view. At this point there is no tenant separation filtering — this is done in the next step.
The views use the function tenant_separation_authorization.company_access_check to verify for each row that the company ID can be accessed by the user, thanks to the tenant_separation_authorization.company_access table. This is where tenant separation filtering occurs.
Only rows matching the company ID are returned to the company_1 user.

Currently, we only have the Personio database, which has two tables: companies and employees. Let’s start by adding two new databases:

CREATE DATABASE IF NOT EXISTS tenant_separation;
CREATE DATABASE IF NOT EXISTS tenant_separation_authorization;

Then, we need a table to keep track of which user has access to which company ID:

CREATE TABLE IF NOT EXISTS tenant_separation_authorization.company_access
(
  id INT AUTO_INCREMENT PRIMARY KEY,
  db_user VARCHAR(50) UNIQUE,
  company_id int(10) unsigned NOT NULL
);

In this table, we will list the technical users and the company_id they can access. For instance, we could have:

SELECT * FROM tenant_separation_authorization.company_access;
+----+------------+------------+
| id | db_user    | company_id |
+----+------------+------------+
| 34 | company_1  |         1  |
+----+------------+------------+

This table will then be used in this function, which will verify if the user connected is allowed to access the company_id and for each row returned by the view (see below) if the company_id linked to that user is allowed to see that row:

DELIMITER //
CREATE FUNCTION IF NOT EXISTS tenant_separation_authorization.company_access_check (sess_user VARCHAR(50), tbl_company_id int(10) unsigned)
RETURNS BOOLEAN
NOT DETERMINISTIC
READS SQL DATA
SQL SECURITY INVOKER
BEGIN
    SELECT company_id
    INTO @v_company_id_check
    FROM tenant_separation_authorization.company_access
    WHERE db_user = substring_index(sess_user, '@', 1) AND company_id = tbl_company_id;IF @v_company_id_check = tbl_company_id THEN
        RETURN true;
    ELSE
        RETURN false;
    END IF;
END
//
DELIMITER ;

Let’s break this function down:

Function argument → The function takes the caller name (session user) and the ID used for tenant separation (see the view creation below for more details).
NOT DETERMINISTIC→ return different results depending on the input.
READS SQL DATA→ Read but does not modify data.
SQL SECURITY INVOKER→ The function will be evaluated using the caller privileges.
SELECT company_id INTO @v_company_id_check FROM tenant_separation_authorization.company_access WHERE db_user = substring_index(sess_user, ‘@', 1) AND company_id = tbl_company_id→ Store the company_id that the current user is allowed to access into v_company_id_check.
IF @v_company_id_check = tbl_company_id THEN RETURN true→ The function returns true if the ID passed is matching what the user is allowed to access, otherwise returns false.

Finally, we need one view for each table of the personio database:

CREATE
SQL SECURITY DEFINER
VIEW tenant_separation.companies
AS
    SELECT *
    FROM personio.companies tbl
    WHERE tenant_separation_authorization.company_access_check(SESSION_USER(), tbl.id)
WITH CHECK OPTION;CREATE
SQL SECURITY DEFINER
VIEW tenant_separation.employees
AS
    SELECT *
    FROM personio.employees tbl
    WHERE tenant_separation_authorization.company_access_check(SESSION_USER(), tbl.company_id)
WITH CHECK OPTION;

Those views use SESSION_USER() to get the current username and pass the ID used for the tenant separation ( id for companies & company_id for employees).

The last step is to create the technical user:

CREATE USER IF NOT EXISTS 'company_1'@'%' IDENTIFIED BY 'xxxxxxxxxxxxxxxx' REQUIRE SSL;
GRANT SELECT ON tenant_separation.* TO 'company_1'@'%';
INSERT INTO tenant_separation_authorization.company_access(db_user, company_id) VALUES ('company_1', 1);

That user will then connect to the tenant_separation database:

mysql -u company_1 tenant_separation

and execute queries that will only return the rows within the tenant scope:

SELECT * FROM companies;
+---------+------------+
| id      | name       |
+---------+------------+
|      1  | foo        |
+---------+------------+SELECT * FROM employees;
+---------+------------+
| id      | company_id |
+---------+------------+
|     1  | 1           |
|     2  | 1           |
|     3  | 1           |
+---------+------------+

The creation of those technical users must be automated when needed. Look forward to a deeper dive into this in an upcoming article!

Limitation

The main limitation we noticed with this solution is that views expire. When a view is created with SELECT *, the database engine takes a snapshot of the current columns and saves them (see this documentation for reference). When a new column is added, the view will still work but this column will not be returned in the result. And when an existing column is dropped, the view stops working. Also, a new process needs to be put into place to create the view when a new table is created.

To work around that, it’s possible to recreate views automatically with triggers when the schema of the tables evolves, or they can be recreated periodically with a script based on information_schema.

Performance

If we take a step back and look at the efficiency of this implementation, we can see that this solution doesn’t scale very well because every row returned by the query needs to be examined by the company_access_check function. Performance can degrade quickly if the query is against a company with a large number of results.

We have done some real-life tests to see how much impact this has on performances:

Scenario:

In the first scenario, we can see that it takes around 50x longer for the view in tenant_separation to go through the 8,000 matching rows.
In the second scenario, we can see that it takes around 3x longer for the view in tenant_separation to go through the 225 matching rows. Here the impact is more negligible, as we only have a reduced number of rows returned.

To mitigate the performance impact, it’s important to limit the scope of the queries executed and to make sure to add the tenant separation on the “where” clause.

Another optimization is to give more power to the engine optimizer by moving the filtering code from the function to each view. (This solution is described in the MariaDB blog post linked above.) We have not yet implemented that option, as performance is good enough for our use case at this time.

Final thoughts

In an upcoming article, we’ll talk about how we implemented an automated process to grant production access to our engineers, so stay tuned!