Make Drupal 8 Read-only
Those experiments were successfully done on Drupal 8.7.11 and are currently used on Drupal 8.9.
In this article I’ll explain how (and why) we’ve made our Drupal installation read-only.
Motivations
The first (reasonable) question is:
I’ll first explain our current architecture and why we have to have a read-only Drupal.
Our website is hosted on multiple servers. The main server (aka “main”) is not available outside of our corporate network. Only some specific employees have access to the admin area, with limited rights. The other servers are used as “front”: they are publicly available and provide our website to the Internet, but the whole administrator area is not accessible.
The legacy architecture we inherited when we started working on the website had no primary/replica feature in place. Every time a modification was done on the main, we had to clone the database to all the fronts through a deployment. It was cumbersome, many people were involved, both on our team and in the IT department. For those reasons, we usually made a deployment only once a week, which was far from sufficient for the editors.
The reason behind it is simple: if a front is attacked and corrupted, the main is safe.
When we asked to have a real primary/replica solution, we had to keep the same constraint: the information can only go from the main to the front, but not the other way around. Since it’s not possible to have a primary/replica if the replicas are modified (which is the case with Drupal), we had to make the fronts read-only.
In fact, having read-only fronts makes sense security-wise: it’s harder to hack a website when it’s not possible to write into its database. By not being able to write into the database, a whole lot of attack vector becomes harder to execute or even impossible (privilege escalation, database corruption, …). Furthermore, defacing existing pages (except by replacing assets) and insert malicious code into it (typically for XSS) requires to edit the cache directly.
Editing the cache directly is not really lasting since it’s regularly cleared and can be easily purged by hand when the defacing is visible. Without mentioning that editing the cache directly requires either a really specific vulnerability or direct access to the server.
Last, but not least: since the fronts are simple copies of the main, the moment a front is corrupted it’s possible to remove it and replace it with a new instance. The main being protected from direct access, we can consider it safe and create new fronts from it whenever we need to.
But how?
That’s the whole point of this article: making Drupal read-only is everything but simple or straightforward.
Drupal loves its database. Almost everything we can think of is stored in it. Drupal loves to cache stuff too (render cache, entity cache, container cache, and so on). And some core modules loves to use non-cache tables as cache, too, (which is not convenient at all).
The cache is a natural thing that has to be generated dynamically when rendering a page. So are the logs. What is less common in databases are the semaphores, and what is usually not generated dynamically are some locale properties and some translations.
Drupal is definitely heavy machinery.
Move a lot of things out of the Database
The first things to address, the most obvious and the easiest ones, are the cache, the logs, and the semaphores.
Let’s start with the cache.
Use Redis to do caching
We’ve decided to use Redis as a locale cache for the read-only instances of Drupal since it still needs to store cache somewhere.
To install Redis, I recommend going straight to the README file of Drupal’s Redis module repository.
In our case, it’s really easy, Redis is installed locally.
// settings.php$settings['container_yamls'][] = 'modules/redis/example.services.yml';$settings['redis.connection']['host'] = 'redis';
$settings['redis.connection']['port'] = NULL;
$settings['cache']['default'] = 'cache.backend.redis';
// You can also use cache_prefix if you have a multi site website
$settings['cache_prefix'] = '<site name>_';
Now all the caches used by Drupal and the modules are (or should be) in Redis. Unfortunately, as I discovered, Drupal still tries to write the container into the database. So let’s address this.
Container outside the DB
After some research (mostly breakpoints to see where it crashes), I discovered that there is two bootstrap process into Drupal. The second one is the obvious one: the bootstrap of the whole application. But the first one, happening before it, is some kind of pre-bootstrap where Drupal has a light container in charge of building the whole cached container.
After some deep dive into the DrupalKernel
(see here and here for more details), I had to set a bootstrap_container_definition bootstrap_container_definition
setting to tell Drupal to store the container into Redis.
// settings.php
$settings['bootstrap_container_definition'] = [
'services' => [
'cache.container' => [
'factory' => ['@cache.container_factory', 'get'],
'arguments' => [$app_name . '::container'] // The two dots :: are used to avoid collision with any other cache entry
],
'cache.container_factory' => [
'class' => 'Drupal\redis\Cache\CacheBackendFactory',
'arguments' => ['@redis.factory', '@cache_tags_provider.container', '@serialization.phpserialize'],
],
'serialization.phpserialize' => [
'class' => 'Drupal\Component\Serialization\PhpSerialize',
],
'redis.factory' => [
'class' => 'Drupal\redis\ClientFactory',
],
'cache_tags_provider.container' => [
'class' => 'Drupal\redis\Cache\RedisCacheTagsChecksum',
'arguments' => ['@redis.factory'],
],
],
];
Ok now the container is correctly generated into Redis, and the rest of the cache too. Next step, the logs.
Logs outside the DB
This one is easy: uninstall the Database Logging core module.
To keep the logs available (which is a good practice), the documentation of Drupal is really clear: https://www.drupal.org/docs/8/api/logging-api/overview.
We’ve simply created a local_logs
module that stores the logs into files, in a way that allows our monitoring team to put it into Kibana. For this case, I’ll not go into detail and let you have a look at the documentation.
Last but not least…
Semaphore out of the DB
I’m not really proud of this one, but since there are no interactions whatsoever with the fronts in our configuration (no user can log in and the databases are read-only), the use of semaphores doesn’t make much sense.
// services.yml
services:
lock:
class: Drupal\Core\Lock\NullLockBackend
tags:
- { name: backend_overridable }
lazy: true
lock.persistent:
class: Drupal\Core\Lock\NullLockBackend
tags:
- { name: backend_overridable }
Now we have finished all the obvious tasks; caches, logs, and semaphores have been taken care of. Unfortunately, it’s still not working as it should.
Let’s dive into the not-so-obvious parts where Drupal (or some modules) writes into the database.
Handle writing operations that are not easy to deal with
Surprisingly, Drupal sometimes uses some non-cache tables like a kind of cache. I’m quite sure it’s not by design, nevertheless, we have to deal with it.
Handle key_value
Some values are dynamically written in the key_value
table. By dynamically, I mean during the display of a page instead of as a consequence of a setting modification. Furthermore, when some of those values are present in the key_value
table, they are not computed anymore, and when they are removed they are then re-computed and re-written into the database.
In those cases, the key_value
table is used as a cache but without the “traceable” capability, and with no lifespan defined.
Since the values stored in the key_value
table might represent a certain amount of computing time, ignoring it the same way we ignore the semaphores will result in a loss of performance. However, we have to handle the fact that those values might be changed by the main.
In this case, the solution is to decorate the key_value
store and write locally (using Drupal’s cache in Redis) the modifications done to the key_value
table (both the new value and the overridden value). The logic of the decorator is the following:
For write operations:
- Store the new value and the overridden value.
For delete operations:
- Store the new value (in this case: empty/deleted) and the overridden value aimed at being deleted.
For reading operations:
- When no entries are available in the decorator, the value returned by default is the one from the DB (always written by the main).
- When a value is available in the decorator and the overridden value stored in the decorator is the same as the one in the
key_value
table, return the value stored in the decorator. - When a value is available in the decorator but the overriding value stored in the decorator is not the same as the one in the
key_value
table, erase the value stored in the decorator and return the value of thekey_value
table. This means the value has been overridden by the main, and it takes priority.
The resulting code is not trivial, but it allows us to keep good performance and having a consistent result when values are updated on the main.
# services.yml
sq_readonly.keyvalue:
class: Drupal\sq_readonly\KeyValueStore\KeyValueDecoratorFactory
decorates: keyvalue
arguments: ['@sq_readonly.keyvalue.inner', '@cache.sq_readonly_keyvalue']
sq_readonly.keyvalue_expirable:
class: Drupal\sq_readonly\KeyValueStore\KeyValueExpirableDecoratorFactory
decorates: keyvalue.expirable
arguments: ['@sq_readonly.keyvalue_expirable.inner', '@cache.sq_readonly_keyvalue_expirable']# Since the storage is done in the Redis cache, we have to create
# the bins toocache.sq_readonly_keyvalue:
class: Drupal\Core\Cache\CacheBackendInterface
factory: cache_factory:get
arguments: ['sq_readonly_keyvalue']
tags:
- { name: cache.bin }
cache.sq_readonly_keyvalue_expirable:
class: Drupal\Core\Cache\CacheBackendInterface
factory: cache_factory:get
arguments: ['sq_readonly_keyvalue_expirable']
tags:
- { name: cache.bin }
For the sake of readability, the KeyValue code is not directly in this article but available in this gist.
locales_source
This table is used to keep track of a specific Drupal version for a given translatable string. I’m not quite sure what is the utility behind this, but since it tries to write in the database, it has to be taken care of.
The solution here is to extend StringDatabaseStorage and disable writing operations. The performance loss in this case is negligible.
# settings.yaml
locale.storage:
class: Drupal\sq_readonly\Locale\ReadonlyStringDatabaseStorage
arguments: ['@database']
tags:
- { name: backend_overridable }
Fallback for everything else
There are plenty of modules that try to write in the database on “runtime” (when a simple user displays a page). For example, the module 404_redirect
keeps track of every 404 that happened on the website. It’s not possible to override every module with custom logic, and we want to be able to extend the website and install contrib modules without having to deal with read-only customization.
The ultimate safeguard is to prevent every non-select operations in the database. It’s better to have a slower working website than a broken website.
To do so, we decorate the connection and log everything that tries to execute requests that are not SELECT
, without doing it (and thus without crashing).
# services.yml
database:
class: Drupal\Core\Database\Connection
factory: Drupal\sq_readonly\Database\ReadonlyDatabase::getConnection
arguments: [default]
database.replica:
class: Drupal\Core\Database\Connection
factory: Drupal\sq_readonly\Database\ReadonlyDatabase::getConnection
arguments: [replica]
Automatic switch between read-only and not read-only
Since the databases are synchronized between main and fronts, we want to be able to have only the fronts as read-only and not the main.
To do so, we use a flag in a file stored in the file system. Using an environment variable is also an option.
Clear the cache on the fronts
Drupal actively cleans cache with a tag system with a system of composition. For example, when a page is generated, every block in the page have their cache tags added to the whole page. When a block or a content-type is modified in the admin area, Drupal clear all cache entry related to its tags.
Since the edition operations are not done in the front instance, the cache is not cleared, leading to stale content presented to the users.
I see two ways of addressing it:
The easy way — Cron
If you don’t need to have your website updated exactly when the content is edited, a simple cron clearing the whole Redis cache in the front (for example every 5 minutes) should be enough.
Even better, if you are behind a CDN that supports the stale-while-revalidate
cache directive (see here for more information) your visitors will not see the cache rebuild process at all.
The naive way would be to set the cache duration of Drupal to 5 minutes. Unfortunately, at the time this work has been done, it wasn’t possible because of known issues: https://www.drupal.org/docs/8/api/cache-api/cache-max-age#s-limitations-of-max-age,
- #2352009: [pp-3] Bubbling of elements’ max-age to the page’s headers and the page cache
- #2449749: Add #cache[‘downstream-ttl’] to force expiration after a certain time and fix #cache[‘max-age’] logic by adding #cache[‘age’]
- #2835068: PageCache caching uncacheable responses (violating HTTP/1.0 spec) + D8 intentionally disabling HTTP/1.0 proxies = WTF
- #2951814: Always set X-Drupal-Cache and X-Drupal-Dynamic-Cache headers, even for responses that are not cacheable
The hard way — Front notifications
Since some parts of the cache are cleared when an update is made, we need to report this to the fronts.
The first step is to create a cache decorator that will notify the fronts (in this case, Drupal has to know every front he is working with). Then we have to create a controller that will listen to those notifications to clear the cache.
Create a Cache Decorator
We have had our deal of decorators during this article. This one store every update/delete operations done in the cache and send them to the fronts as soon as the onKernelTerminate
event is fired. Using the onKernelTerminate
event ensure the connection with the client (in this case, the person responsible for the cache update) is closed, and we avoid freezing the admin interface while the messages are sent to the fronts.
Listen to notification and clear the caches in the front
On the front side, a controller has to be created. It’s in charge of getting the cache clear requests from the main. When it’s called, it clear every key of every cache it was requested to clear by the main’s cache decorator.
Securing the connection between the main and the fronts is not a problem: since they share the same database, they can easily share a secret through Drupal’s configuration. The controller then ensures that the secret is correct before proceeding to apply the cache clear requests.
# services.yml
sq_readonly.cache_factory:
class: Drupal\sq_readonly\Cache\CacheBackendDecoratorFactory
decorates: cache_factory
arguments: ['@sq_readonly.cache_factory.inner', '@sq_readonly.cache_clearer']
And for the router:
# routing.yml
sq_readonly.clear:
path: '/api/sq-readonly/v1/update'
defaults:
_title: 'Update the cache'
_controller: '\Drupal\sq_readonly\Controller\APIController::update'
methods: [POST]
requirements:
_permission: 'access content'
For the sake of readability, the code of the notification cache cleaning system is available in this gist.
Our way
In our case, we started the hard way and created a cache decorator and a controller (as explained above). It worked sometimes, but not every time. Unfortunately, we didn’t have time to find why the fronts weren’t always notified by the main.
We decided to enforce the cache clean process by adding a cron, executed every 5 minutes.
In the end, we decided to remove entirely the usage of the cache decorator: the cron is way easier to maintain and it’s enough for our needs.
In conclusion
Making Drupal read-only is far from trivial, nor is it really documented. There are sometimes strange behaviors from some core modules, like using the key_value
store as a cache. It’s also needed to have a global fallback for contrib modules.
In the end, the only required steps to have a functional read-only Drupal are moving the cache and the container cache into Redis, and the global fallback. Every other step can be considered nice to have (some may say the logs are a hard requirement though).
In the end, we are now able to have as many “independent” fronts as required, there is no impact on the main (except for the database synchronization), and the main is less prone to attack.
This solution has been initially deployed on Drupal 8.7.11 during the beginning of the year 2020. Almost a year and many updates later, it’s still working pretty well on Drupal 8.9.9.