Transparent datasource routing in Spring
Level: medium / advanced
The Spring framework is said to be extremely flexible. But what does it mean? Authors claim that Spring allows you to adjust almost any piece of behavior to your needs and make the framework work totally for you. It’s no different with the DataSource
interface, which can be overridden in any manner. The question is how to manage multiple datasources when the endpoint depends on the user?
Use case
A while back I was working on a medium-size SaaS B2B system for clients like banks or call centers. One of our key concerns was data privacy — our users were crazy about it. Most of them didn’t accept the idea of mixing their and other clients’ data within the same database, so logical separation was not an option. We thus decided to go for physical data separation. Each client had their own discrete database, totally separated from the others.
In this case, three challenges were identified:
- How to deal with routed datasources?
- How to manage them dynamically at runtime for each new client?
- How to make it transparent in code?
Databases that can be added or removed by the user are not a common problem faced by developers, but they are extremely important when dealing with multiple DB’s. What is more, we wanted to get rid of the human factor so no single developer would need to be aware of datasource routing per client while implementing new features. How did I solve that? Let’s dig into it deeper.
Solution
To route between datasources, all clients have to be stored along with a database URL. Because the datasource should ensure extremely fast read speeds, we decided to go for DynamoDB with local cache. DB instances for clients are created in advance. New clients are stored as they register in the app in the following steps:
- Get a random database from the databases pool
- Store the client with a database endpoint
- Send an event to the router with the new client
- Add a brand new DB to the pool
Then, every request is processed as follows:
- Add the client key (ex. name) to the request header
- Read the client key before Spring controller and store it in thread-local
- Process the request as usual
- Get the current client domain, retrieve the db connection string and connect to the DB in
RoutableDatasource
which implements the standardDatasource
interface
With this approach, the developer does not have to be aware which client is currently serviced besides asynchronous calls or actions on more than one client. Let’s now see how it’s coded.
Transparency
The servlet processes requests in a single thread. Spring uses ThreadLocal
to store information about the request, and so did I. Using @ControllerAdvice
and @InitBinder
makes it easy to save client data.
@ControllerAdvice
public class DomainBinder {
@InitBinder
public void bindDomain(@RequestHeader Optional<String> domain) {
domain.ifPresent(ThreadLocalClientResolver::setClientDomain);
}
}
This example shows how to get the client domain from the request header, but we can extract this data from path, cookie or any other part of the request as well. @InitBinder
allows to inject HttpServletRequest
and get any data.
To store a client domain, I use ThreadLocalClientResolver
, which is a simple wrapper for ThreadLocal
public class ThreadLocalClientResolver {private static final ThreadLocal<String> CLIENT_DOMAIN = new ThreadLocal<>();public Optional<String> resolve() {
String domain = CLIENT_DOMAIN.get();
if (StringUtils.isBlank(domain)) return empty();return Optional.of(domain);
}public static void setClientDomain(String clientDomain) {
CLIENT_DOMAIN.set(clientDomain);
}public static String getClientDomain() {
return CLIENT_DOMAIN.get();
}public static void cleanClientDomain() {
CLIENT_DOMAIN.remove();
}
Now we can get the domain in almost any part of code BUT NOT in the @Async
code. The asynchronous function uses a new thread to service request but DOES NOT rewrite theThreadLocal
variables.
Async functions
To use ThreadLocal
variables in async code, we need to rewrite all variables passing to the async function and setting it as a thread-local variable. To simplify this operation, create a function that takes action as a parameter:
public static <T> T call(String domain, Callable<T> action) throws Exception {
String old = ThreadLocalClientDomainResolver.getClientDomain();
try {
ThreadLocalClientDomainResolver.setClientDomain(domain);
return action.call();
} finally {
ThreadLocalClientDomainResolver.setClientDomain(old);
}
}
This approach can be used while executing actions for all clients or for changing the context for a while. For example:
public void runForAllClients(Callable task) {
clientDao.findAll().forEach(c -> call(c.getDomain(), task));
}
Dynamic datasource routing
We know how to set the client before the controller function. To make the code fully transparent it’s crucial to provide the client datasource just before the database call. My solution is to use AbstractRoutingDataSource
from the org.springframework.dbc.datasource.lookup
package. Initialising is as simple as:
public class DataSourceRouter extends AbstractRoutingDataSource {public DataSourceRouter(Map<String, DataSource> clientsDataSources) {
setTargetDataSources((Map) clientsDataSources);
}@Override
protected String determineCurrentLookupKey() {
return ClientDomainResolver.resolve();
}
}
This class extends AbstractRoutingDataSource
, which extends AbstractDataSource
which implements javax.sql.DataSource
and can be provided like a usual datasource for Hibernate
or JOOQ
, which gives you full transparency. The database is determined just before the actual call to the endpoint.
Initialize Datasource
Before you route to any database, you need to create a router at the application start like any other datasource. It can be done as follows:
public DataSourceRouter create(List<Client> clientsData) {
Map<String, DataSource> clientDataSources = clientsData.stream()
.collect(toMap(Map.Entry::getKey, e -> create(e.getValue())));return new DataSourceRouter(clientDataSources);
}
This approach does not affect the connection pool or any other datasource features. The create
function uses the Apache DBCP2
datasource with connection pooling:
public static DataSource create(Client c) {
PoolableConnectionFactory factory = new PoolableConnectionFactory(
new DriverManagerConnectionFactory(c.endpoint, c.user, c.password), null);
GenericObjectPool<PoolableConnection> connectionPool = new GenericObjectPool<>(factory);
connectionPool.setConfig(poolConfig(maxTotalConnections, maxIdleConnections));
connectionPool.setAbandonedConfig(abandonedConfig());factory.setPool(connectionPool);
factory.setMaxOpenPrepatedStatements(maxTotalConnections);
factory.setConnectionInitSql(singletonList(INIT_SQL));return new PoolingDataSource<>(connectionPool);
Summary
Routing datasources is a feature that comes with Spring out of the box thanks to the AbtractRoutingDataSource
class which implements DataSource
. To make it transparent for developers, add a client identifier to thread-local and read it using the determineCurrentLookupKey
method.