Querying multiple MySQL databases at once

Valentin Iljaž
The Startup
Published in
6 min readJan 30, 2020

Imagine you are building a SaaS product (e-commerce) where every tenant lives in its own little database (MySQL). On the one hand, we can ensure that the tenant’s data is actually theirs. On the other hand, we have built a system that uses sharding out of the box. Win, win, or …?

The SaaS product is gaining traction, tenant by tenant. You have amounted 100s of tenants, and now you would like to know how these tenants are using your software. How many products have they added, how many orders have their users placed, etc.? In order to find those figures, you need to query every single database of every tenant. Hmm, what do to?

What if we could create a table that would somehow contain the tables of all tenants? Then we could write queries against this one colossal table. After we did the same for every other table, we could query all these tables at once, writing JOINs, GROUP BYs, etc. However, how can we build such a table, you ask!

Well, we can take a look at the arsenal that the RDBMS (MySQL) gives us. VIEWs are such a tool; they are meant to present a different view of the underlying data structures. So, if we could build a view that would contain the tables of all the tenants, then we would be to a good start. All these tables would need to be joined together, like a union of all the tables.

Moreover, that is precisely what we will be building, a UNION of all the tenants’ tables represented under one VIEW. When we repeat the process for every table, we can have all the databases under the tips of our fingers.

So, how would such a VIEW look like:

Each VIEW contains all the columns from the original tables with one additional column called “tenant.” The column contains the name of the tenant’s DB from which the record is taken. On the one hand, you can easily know whose data you are looking at. However, on the other hand, you need to be careful while doing JOINs, since you must also add this column to the JOIN conditions:

Now, all that remains is to create such a VIEW for every table in the database. We can use any programming language for the task, but what is fun in that. We want to use Stored procedures because:

  • no dependencies (executable environment is already in the RDBMS),
  • updating the VIEWs is as simple as running an SQL command from your favourite client.

The game plan is simple; first, we build stored procedure that can create a VIEW of a single table; then, we iterate over all tables, creating a view in each step.

Stored procedure for creating a view of a single table

The stored procedures takes 2 arguments:

  • tbl_name: name of the table for which we need to create a VIEW
  • db_pattern_re: regular expression for filtering out the databases that will be included in the VIEW

A) First, we need to declare a cursor that will be used to iterate over all DBs included in the VIEW. To filter out only the wanted DBs, we use db_pattern_re regular expression. CONTINUE HANDLER will make sure that the loop stops after iterating over all found DBs.

B) Initialise the variable, all_dbs_view, that will hold our full CREATE VIEW statement. The variable can be pretty long (LONGTEXT), depending on the number of filtered out DBs.

C) Then we open the cursor and start iterating over each filtered out DB. The IF statement will check for all_dbs_done variable in each step. The variable will set be to 1 when CONTINUE HANDLER is triggered.

D) The first iteration does not need to prepend UNION ALL, so we skip it; all subsequent iterations will prepend UNION ALL. Next line is the money shot; it concatenates the actual SELECT statement that will include all the records from the table from specific tenant’s DB:

SELECT "tenants_DB" AS tenant, t.* FROM tenants_DB.some_table AS t

E) Before creating the VIEW, we must make sure that we delete an existing one if it exists.

F) Now, we actually run the statement that will create the VIEW.

We can execute the procedure for creating a VIEW of a single table:

CALL update_table_view("sample_table", "tenant_[0-9]+");

Stored procedure for looping over all the tables

The stored procedure takes 2 arguments:

  • db_first: name of a database which will be used to list all tables; it can be any DB in your bag of DBs
  • db_pattern_re: regular expression for filtering out the databases that will be included in the VIEW

A) We declare a cursor that is used to iterate over all tables found in database db_first. CONTINUE HANDLER will make sure that the loop stops after iterating over all found tables.

B) We open the cursor and start iterating over each found table. The IF statement will check for all_tbls_done variable in each step. The variable will set be to 1 when CONTINUE HANDLER is triggered.

C) Once we have the value of the current table stored in variable cur_tbl, we can execute the stored procedure that will create a VIEW specifically for cur_tbl.

When we execute the stored procedure:

CALL update_all_views("tenant_1", "tenant_[0-9]+");

we get:

  • a list of all the VIEWs that represent each table from database tenant_1;
  • each VIEW is essentially a UNION of the same table across all tenants.

That is it!

The pros

What are the benefits of using this method:

  • no dependencies (all is run within MySQL server)
  • can be used from any MySQL client
  • know and familiar data model (basically the same)
  • easy to set up, easy to update, easy to use, easy to reason about
  • can run any arbitrary query

The cons

Moreover, what are the disadvantages:

  • while joining the tables, you must add the extra condition
  • queries are not super fast (depending on the number of tenants) but fast enough
  • does not scale beyond 100s of tenants

Conclusion

In our current setup, we have cca. 340 tables and cca. 250 tenants. We have not done any testing yet to see how far can this setup scale. At the moment, queries are responsive, and it is quite fun to use it. We have found out many interesting things, just by running these simple queries.

Even though it would not be practical to use this setup for thousands or millions of tenants, it can still be useful on a subset of your tenants e.g., tenants from 34 through 76:

CALL update_all_views("tenant_34", "tenant_(3[4-9]|[4-6][0-9]|7[0-6])$");

So, to wrap up, it is a very convenient setup for running an ad-hoc analysis of all your tenants. When all you need is to just run some simple queries; just to try some things out.

The caveats

Like for every software solution, there are these little caveats that we have not mentioned above.

1) When executing the actual SQL command for creating a VIEW, the MySQL server might throw an exception like this one:

Prepared statement needs to be re-prepared

Our workaround was this:

SET GLOBAL table_definition_cache = 2800;

Your actual number (2800) will probably be somewhat different, depending on the number of your tenants.

2) When doing queries with a condition on a specific tenant:

SELECT * FROM sample_table WHERE tenant = "tenant_1"

You might get an exception (depending on how you created the database):

Illegal mix of collations (utf8mb4_general_ci,COERCIBLE) and (utf8mb4_unicode_ci,IMPLICIT) for operation ‘=’

The solution is to force the encoding (collation) of the database name while creating the VIEW. Add a call to CONVERT function in your SELECT statement:

SELECT CONVERT("tenants_DB" USING utf8mb4) AS tenant, t.* FROM tenants_DB.some_table AS t

Instead of utf8mb4, use whatever you set for the encoding when you created the database.

3) Connection timeout might also be a problem. Creating all these VIEWs can take some time; 5 minutes in our case. So, make sure you set ample timeout for connections in your client.

4) It is best if you prepare the VIEWs and run queries on the replica server so that you do not put an extra burden on your production servers. Queries can become heavy quite quickly.

5) For testing the above method, we used MySQL Community Server 5.7.20.

Image credit: Photo by Kolar.io on Unsplash

Check out my latest project WEBACUS.dev

The missing calculator for developers consisting of many diverse operations that are needed by developers in their everyday work. Instead of searching Google and learning the interface for each operation, Webacus provides consistent and simple interface enabling fluid workflow.

The main focus of the calculator is its INTEROPERABILITY — being able to chain all operations into a stream and producing the result. No more COPY & PASTE between operations while working with the calculator.

Try WEBACUS

--

--