Make Drupal 8 Read-only

Elendev

Published in

Swissquote Tech Blog

11 min readDec 11, 2020

Those experiments were successfully done on Drupal 8.7.11 and are currently used on Drupal 8.9.

In this article I’ll explain how (and why) we’ve made our Drupal installation read-only.

Motivations

The first (reasonable) question is:

Probably a non-negligible part of the readers

I’ll first explain our current architecture and why we have to have a read-only Drupal.

Our website is hosted on multiple servers. The main server (aka “main”) is not available outside of our corporate network. Only some specific employees have access to the admin area, with limited rights. The other servers are used as “front”: they are publicly available and provide our website to the Internet, but the whole administrator area is not accessible.

The legacy architecture we inherited when we started working on the website had no primary/replica feature in place. Every time a modification was done on the main, we had to clone the database to all the fronts through a deployment. It was cumbersome, many people were involved, both on our team and in the IT department. For those reasons, we usually made a deployment only once a week, which was far from sufficient for the editors.

The reason behind it is simple: if a front is attacked and corrupted, the main is safe.

When we asked to have a real primary/replica solution, we had to keep the same constraint: the information can only go from the main to the front, but not the other way around. Since it’s not possible to have a primary/replica if the replicas are modified (which is the case with Drupal), we had to make the fronts read-only.

In fact, having read-only fronts makes sense security-wise: it’s harder to hack a website when it’s not possible to write into its database. By not being able to write into the database, a whole lot of attack vector becomes harder to execute or even impossible (privilege escalation, database corruption, …). Furthermore, defacing existing pages (except by replacing assets) and insert malicious code into it (typically for XSS) requires to edit the cache directly.

Editing the cache directly is not really lasting since it’s regularly cleared and can be easily purged by hand when the defacing is visible. Without mentioning that editing the cache directly requires either a really specific vulnerability or direct access to the server.

Last, but not least: since the fronts are simple copies of the main, the moment a front is corrupted it’s possible to remove it and replace it with a new instance. The main being protected from direct access, we can consider it safe and create new fronts from it whenever we need to.

But how?

That’s the whole point of this article: making Drupal read-only is everything but simple or straightforward.

Me, trying to explain how to make Drupal read-only

Drupal loves its database. Almost everything we can think of is stored in it. Drupal loves to cache stuff too (render cache, entity cache, container cache, and so on). And some core modules loves to use non-cache tables as cache, too, (which is not convenient at all).

The cache is a natural thing that has to be generated dynamically when rendering a page. So are the logs. What is less common in databases are the semaphores, and what is usually not generated dynamically are some locale properties and some translations.

Drupal is definitely heavy machinery.

Move a lot of things out of the Database

The first things to address, the most obvious and the easiest ones, are the cache, the logs, and the semaphores.

Let’s start with the cache.

Use Redis to do caching

We’ve decided to use Redis as a locale cache for the read-only instances of Drupal since it still needs to store cache somewhere.

To install Redis, I recommend going straight to the README file of Drupal’s Redis module repository.

In our case, it’s really easy, Redis is installed locally.

// settings.php$settings['container_yamls'][] = 'modules/redis/example.services.yml';$settings['redis.connection']['host'] = 'redis';
$settings['redis.connection']['port'] = NULL;
$settings['cache']['default'] = 'cache.backend.redis';
// You can also use cache_prefix if you have a multi site website
$settings['cache_prefix'] = '<site name>_';

Now all the caches used by Drupal and the modules are (or should be) in Redis. Unfortunately, as I discovered, Drupal still tries to write the container into the database. So let’s address this.

Container outside the DB

After some research (mostly breakpoints to see where it crashes), I discovered that there is two bootstrap process into Drupal. The second one is the obvious one: the bootstrap of the whole application. But the first one, happening before it, is some kind of pre-bootstrap where Drupal has a light container in charge of building the whole cached container.

After some deep dive into the DrupalKernel (see here and here for more details), I had to set a bootstrap_container_definition bootstrap_container_definition setting to tell Drupal to store the container into Redis.

// settings.php
$settings['bootstrap_container_definition'] = [
  'services' => [
    'cache.container' => [
      'factory' => ['@cache.container_factory', 'get'],
      'arguments' => [$app_name . '::container'] // The two dots :: are used to avoid collision with any other cache entry
    ],
    'cache.container_factory' => [
      'class' => 'Drupal\redis\Cache\CacheBackendFactory',
      'arguments' => ['@redis.factory', '@cache_tags_provider.container', '@serialization.phpserialize'],
    ],
    'serialization.phpserialize' => [
      'class' => 'Drupal\Component\Serialization\PhpSerialize',
    ],
    'redis.factory' => [
      'class' => 'Drupal\redis\ClientFactory',
    ],
    'cache_tags_provider.container' => [
      'class' => 'Drupal\redis\Cache\RedisCacheTagsChecksum',
      'arguments' => ['@redis.factory'],
    ],
  ],
];

Ok now the container is correctly generated into Redis, and the rest of the cache too. Next step, the logs.

Logs outside the DB

This one is easy: uninstall the Database Logging core module.

To keep the logs available (which is a good practice), the documentation of Drupal is really clear: https://www.drupal.org/docs/8/api/logging-api/overview.

We’ve simply created a local_logs module that stores the logs into files, in a way that allows our monitoring team to put it into Kibana. For this case, I’ll not go into detail and let you have a look at the documentation.

Last but not least…

Semaphore out of the DB

I’m not really proud of this one, but since there are no interactions whatsoever with the fronts in our configuration (no user can log in and the databases are read-only), the use of semaphores doesn’t make much sense.

// services.yml
services:
  lock:
    class: Drupal\Core\Lock\NullLockBackend
    tags:
      - { name: backend_overridable }
    lazy: true
  lock.persistent:
    class: Drupal\Core\Lock\NullLockBackend
    tags:
      - { name: backend_overridable }

Now we have finished all the obvious tasks; caches, logs, and semaphores have been taken care of. Unfortunately, it’s still not working as it should.

Let’s dive into the not-so-obvious parts where Drupal (or some modules) writes into the database.

Handle writing operations that are not easy to deal with

Surprisingly, Drupal sometimes uses some non-cache tables like a kind of cache. I’m quite sure it’s not by design, nevertheless, we have to deal with it.

Handle key_value

Some values are dynamically written in the key_value table. By dynamically, I mean during the display of a page instead of as a consequence of a setting modification. Furthermore, when some of those values are present in the key_value table, they are not computed anymore, and when they are removed they are then re-computed and re-written into the database.

In those cases, the key_value table is used as a cache but without the “traceable” capability, and with no lifespan defined.

Since the values stored in the key_value table might represent a certain amount of computing time, ignoring it the same way we ignore the semaphores will result in a loss of performance. However, we have to handle the fact that those values might be changed by the main.

In this case, the solution is to decorate the key_value store and write locally (using Drupal’s cache in Redis) the modifications done to the key_value table (both the new value and the overridden value). The logic of the decorator is the following:

For write operations:

Store the new value and the overridden value.

For delete operations:

Store the new value (in this case: empty/deleted) and the overridden value aimed at being deleted.

For reading operations:

When no entries are available in the decorator, the value returned by default is the one from the DB (always written by the main).
When a value is available in the decorator and the overridden value stored in the decorator is the same as the one in the key_value table, return the value stored in the decorator.
When a value is available in the decorator but the overriding value stored in the decorator is not the same as the one in the key_value table, erase the value stored in the decorator and return the value of the key_value table. This means the value has been overridden by the main, and it takes priority.

The resulting code is not trivial, but it allows us to keep good performance and having a consistent result when values are updated on the main.

# services.yml
sq_readonly.keyvalue:
  class: Drupal\sq_readonly\KeyValueStore\KeyValueDecoratorFactory
  decorates: keyvalue
  arguments: ['@sq_readonly.keyvalue.inner', '@cache.sq_readonly_keyvalue']

sq_readonly.keyvalue_expirable:
  class: Drupal\sq_readonly\KeyValueStore\KeyValueExpirableDecoratorFactory
  decorates: keyvalue.expirable
  arguments: ['@sq_readonly.keyvalue_expirable.inner', '@cache.sq_readonly_keyvalue_expirable']# Since the storage is done in the Redis cache, we have to create
# the bins toocache.sq_readonly_keyvalue:
  class: Drupal\Core\Cache\CacheBackendInterface
  factory: cache_factory:get
  arguments: ['sq_readonly_keyvalue']
  tags:
    - { name: cache.bin }

cache.sq_readonly_keyvalue_expirable:
  class: Drupal\Core\Cache\CacheBackendInterface
  factory: cache_factory:get
  arguments: ['sq_readonly_keyvalue_expirable']
  tags:
    - { name: cache.bin }

For the sake of readability, the KeyValue code is not directly in this article but available in this gist.

locales_source

This table is used to keep track of a specific Drupal version for a given translatable string. I’m not quite sure what is the utility behind this, but since it tries to write in the database, it has to be taken care of.

The solution here is to extend StringDatabaseStorage and disable writing operations. The performance loss in this case is negligible.

# settings.yaml
locale.storage:
  class: Drupal\sq_readonly\Locale\ReadonlyStringDatabaseStorage
  arguments: ['@database']
  tags:
    - { name: backend_overridable }

Ignore every writing operations

Fallback for everything else

There are plenty of modules that try to write in the database on “runtime” (when a simple user displays a page). For example, the module 404_redirect keeps track of every 404 that happened on the website. It’s not possible to override every module with custom logic, and we want to be able to extend the website and install contrib modules without having to deal with read-only customization.

The ultimate safeguard is to prevent every non-select operations in the database. It’s better to have a slower working website than a broken website.

To do so, we decorate the connection and log everything that tries to execute requests that are not SELECT, without doing it (and thus without crashing).

# services.yml
database:
  class: Drupal\Core\Database\Connection
  factory: Drupal\sq_readonly\Database\ReadonlyDatabase::getConnection
  arguments: [default]

database.replica:
  class: Drupal\Core\Database\Connection
  factory: Drupal\sq_readonly\Database\ReadonlyDatabase::getConnection
  arguments: [replica]

Since it’s not possible to override directly the connection, we have first to override the whole Database object.

Ignore every non-reading operations

Automatic switch between read-only and not read-only

Since the databases are synchronized between main and fronts, we want to be able to have only the fronts as read-only and not the main.

To do so, we use a flag in a file stored in the file system. Using an environment variable is also an option.

Clear the cache on the fronts

Drupal actively cleans cache with a tag system with a system of composition. For example, when a page is generated, every block in the page have their cache tags added to the whole page. When a block or a content-type is modified in the admin area, Drupal clear all cache entry related to its tags.

Since the edition operations are not done in the front instance, the cache is not cleared, leading to stale content presented to the users.

I see two ways of addressing it:

The easy way — Cron

If you don’t need to have your website updated exactly when the content is edited, a simple cron clearing the whole Redis cache in the front (for example every 5 minutes) should be enough.

Even better, if you are behind a CDN that supports the stale-while-revalidate cache directive (see here for more information) your visitors will not see the cache rebuild process at all.

The naive way would be to set the cache duration of Drupal to 5 minutes. Unfortunately, at the time this work has been done, it wasn’t possible because of known issues: https://www.drupal.org/docs/8/api/cache-api/cache-max-age#s-limitations-of-max-age,

The hard way — Front notifications

Since some parts of the cache are cleared when an update is made, we need to report this to the fronts.

The first step is to create a cache decorator that will notify the fronts (in this case, Drupal has to know every front he is working with). Then we have to create a controller that will listen to those notifications to clear the cache.

Create a Cache Decorator

We have had our deal of decorators during this article. This one store every update/delete operations done in the cache and send them to the fronts as soon as the onKernelTerminate event is fired. Using the onKernelTerminate event ensure the connection with the client (in this case, the person responsible for the cache update) is closed, and we avoid freezing the admin interface while the messages are sent to the fronts.

Listen to notification and clear the caches in the front

On the front side, a controller has to be created. It’s in charge of getting the cache clear requests from the main. When it’s called, it clear every key of every cache it was requested to clear by the main’s cache decorator.

Securing the connection between the main and the fronts is not a problem: since they share the same database, they can easily share a secret through Drupal’s configuration. The controller then ensures that the secret is correct before proceeding to apply the cache clear requests.

# services.yml
sq_readonly.cache_factory:
  class: Drupal\sq_readonly\Cache\CacheBackendDecoratorFactory
  decorates: cache_factory
  arguments: ['@sq_readonly.cache_factory.inner', '@sq_readonly.cache_clearer']

And for the router:

# routing.yml
sq_readonly.clear:
  path: '/api/sq-readonly/v1/update'
  defaults:
    _title: 'Update the cache'
    _controller: '\Drupal\sq_readonly\Controller\APIController::update'
  methods: [POST]
  requirements:
    _permission: 'access content'

For the sake of readability, the code of the notification cache cleaning system is available in this gist.

Our way

In our case, we started the hard way and created a cache decorator and a controller (as explained above). It worked sometimes, but not every time. Unfortunately, we didn’t have time to find why the fronts weren’t always notified by the main.

We decided to enforce the cache clean process by adding a cron, executed every 5 minutes.

In the end, we decided to remove entirely the usage of the cache decorator: the cron is way easier to maintain and it’s enough for our needs.

In conclusion

Making Drupal read-only is far from trivial, nor is it really documented. There are sometimes strange behaviors from some core modules, like using the key_value store as a cache. It’s also needed to have a global fallback for contrib modules.

In the end, the only required steps to have a functional read-only Drupal are moving the cache and the container cache into Redis, and the global fallback. Every other step can be considered nice to have (some may say the logs are a hard requirement though).

In the end, we are now able to have as many “independent” fronts as required, there is no impact on the main (except for the database synchronization), and the main is less prone to attack.

This solution has been initially deployed on Drupal 8.7.11 during the beginning of the year 2020. Almost a year and many updates later, it’s still working pretty well on Drupal 8.9.9.