Understanding Persistence in Redis — AOF & RDB + on Docker

Raphael De Lio
Redis with Raphael De Lio
11 min readAug 22, 2022

Twitter | LinkedIn | YouTube | Instagram
This content is also available as a YouTube video. You can watch it here.

If you read my story on 10 things you didn’t know about Redis — From a Message Broker to a Graph Database, and already have your own server running locally, as explained in How to run Redis locally in a Docker Container and manage it with Redis Insight and Redis CLI. You are ready to enable Redis to persist its data onto the disk.

Redis has two mechanisms to persist its data. Let’s go through each one of them, understand how they work, what are their advantages and disadvantages, and how we can use them.

Redis Database

Redis Database, or RDB, is a mechanism of persistence in which the database will persist the data into the disk as snapshots.

If the server instance goes down, these snapshots can be used to restore a previous database state. The interval in which the snapshots are taken can be configured. For example, you can configure your database to take a snapshot every 1 minute if 10 changes have happened in the dataset or every 5 minutes if 1000 changes have happened in the dataset.

How it works

The snapshots work like a time machine. You can take as many snapshots as you wish, as frequently as you wish, and keep them for as long as you wish. Then you can use these snapshots to restore the database to any point in time in case of disasters.

By default, Redis stores these snapshots in a binary file named dump.rdb. And this RDB file is replaced whenever a new snapshot is created.

Redis takes snapshots by forking its process into a parent and child process. Then, the child process starts writing a new RDB file. And when it’s done writing the new RDB file, it replaces the old one.

Advantages

  • RDB doesn’t impact the performance of the server. Since the main process only has to fork its process and the child process will take care of all the writing on the disk, the performance of the parent process is preserved. However, forking may cause performance decrease.
  • Restarting the database is faster. RDB is faster when restoring large datasets in comparison to AOF, which is another mechanism of persistence we’ll cover later in this story.
  • Compact backup files. The content of the dump file is very compact and can be transferred to other storage such as Google Cloud Storage or Amazon S3.

Disadvantages

  • You can lose minutes of data. Although you can configure your database to take snapshots from time to time, the minimum you would like to set is 5 minutes. That’s because when the data is relatively large or the CPU performance is not great, the fork() operation may be time-consuming, which can lead Redis to stop serving clients from a millisecond to one second.

Working with RDB

To take a snapshot you can either configure Redis to automatically take a snapshot every N minutes or M changes in the dataset or take a manual snapshot yourself. Let’s go through a few commands:

Automatic snapshots

Unless specified in the redis.conf file, the automatic snapshots will be enabled by default.

To check the configuration, you can run:
CONFIG GET save

Which should return:

1) "save"
2) "3600 1 300 100 60 10000"

The default configuration will create a new snapshot:

  • After 3600 seconds (an hour) if at least 1 change was performed.
  • After 300 seconds (5 minutes) if at least 100 changes were performed.
  • After 60 seconds if at least 10000 changes were performed.

To override this configuration, you can either change the config file or run the CONFIG SET command, as in:

CONFIG SET save "120 1"

The example above will ask Redis to create a new snapshot every 2 minutes if at least 1 record has been changed in the dataset.

You can also turn it off by running:

CONFIG SET save ""

Manual Snapshot: Save Command

The SAVE command performs a synchronous save of the dataset producing a point-in-time snapshot of all the data inside the Redis instance.

According to the documentation, you almost never want to call the SAVE command in production because it will block all the other clients. Instead, they recommend using the BGSAVE command. However, if for any reason, there’s an issue preventing Redis from doing the fork(), the SAVE command may be the option to dump the latest dataset.

When you run SAVE, you should see OK as a response. The time to respond is O(N), where N is the total number of keys in the database.

Manual Snapshot: BGSave command

This operation is asynchronous, which means that the BGSAVE command will immediately return OK.

Redis will fork its process, the parent will continue to serve the clients and the child will dump the RDB file.

However, an error is returned if there is already a background save running or if there is another non-background-save process running, specifically an in-progress AOF rewrite.

In this case, you would prefer to use BGSAVE SCHEDULE. This command will immediately return OK. If there’s an AOF rewrite in progress, it will be scheduled to create the snapshot at the next opportunity.

Last Save command

This command will return the UNIX TIME of the last snapshot created with success.

You can run it with: LASTSAVE.

And it will return: (integer) 1660310189

AOF

Append-only file is another mechanism of persistence that will log every write operation received by the server. These logs can then be replayed at server start-up and reconstruct the original dataset.

The commands are logged using the same format as the Redis protocol.

How it works

When Redis finishes executing a write command, it will append the command at the end of the aof_buf buffer of the server in protocol format (The language used between a server and a client in the networking communication)

The flushing of the buffer will be determined by the setting appendfsync, which can be:

  • always (safest, but poor performance)
  • everysec (safe, better performance) (Default)
  • no (Generally up to the Operating System to decide, which is typically ~30 seconds (unsafe, best performance)

This setting will then be used by the flushAppendOnlyFile(), a function that will write the contents of the aof_buf to the AOF file.

Advantages

  • It’s durable. Since every change operation is appended to the file, it’s unlikely to face dataloss.
  • It’s reliable. Even if the log ends with a half-written command for some reason (disk full or other reasons) the redis-check-aof tool is able to fix it easily.
  • It’s flexible. Even if you trigger the FLUSHALL command, which will delete all keys from the database. As long as the file hasn’t been rewritten, you can still stop Redis, edit the file, remove this command and restart your server.

Disadvantages

  • Size of the file. AOF files are usually bigger than the equivalent RDB for the same dataset.
  • Performance. AOF can not be very performant depending on the fsync setting.

Enabling

You can turn on the AOF in your redis.conf file by setting:

appendonly yes

Or by running the command:

CONFIG SET appendonly yes

By running the command above, the file will be generated automatically. However, in order to replay the file on server startup, the setting must be set in the configuration file.

And that’s it. Whenever you restart your database, it will replay the commands on the file automatically and recreate its original state.

AOF Rewrite

All operations that modify the dataset of your database will be appended to the AOF file. This means that the AOF file is an always-growing file.

When your file gets too big, Redis will rewrite it from scratch into a new file. This operation is done by accessing the data in memory, not by reading the old file, which guarantees that it will be created with the shortest number of commands possible.

Once the rewrite is finished, the old file will be overwritten by the new file.

Manual Rewrite with BGREWRITEAOF

Redis will automatically trigger the rewrite process, however, you can also trigger it manually if you wish. You can do it by running the following command:

BGREWRITEAOF

If there’s another persistence operation running in the background, the Rewrite operation will be scheduled for a later time.

Digging into the file

Let’s start by editing our redis.conf file, enabling AOF and disabling RDB.

Then, let’s connect to Redis Insight and use the Workbench to set two keys:

Now, let’s do a cat on our file:

*N is the number of arguments of the command, and $M is the length, i.e. the number of bytes, of each argument.

In our case, Redis executed:

  • SELECT 0
  • SET firstKey “I’m number one”
  • SET secondKey “I’m number two”

Now, let’s edit the first key:

And let’s check the file again:

We can see that Redis logged all our commands:

  • SELECT 0
  • SET firstKey “I’m number one”
  • SET secondKey “I’m number two”
  • SET firstKey “I’m still number one”

However, is the first SET command still required to rebuild the database? Not really, it has already been overwritten by the third SET command. Before this file gets very big, let’s ask Redis to rewrite it:

And let’s read the file again:

Now, at first, this was weird for me. I was expecting the AOF file to be something like:

*2
$6
SELECT
$1
0
*3
$3
SET
$9
secondKey
$14
I'm number two
*3
$3
SET
$8
firstKey
$20
I'm still number one

However, it became:

REDIS0009?	redis-ver6.2.7?
redis-bits?@?ctimežk?bused-mem˜??
aof-preamble???????֭h
????mʗ????~??ױ??firstKeyI'm still number one secondKeyI'm number two??????֭h
????mʗ??????!4d?

This header is the same header of an RDB file, so I did the following experiment:

  1. Stopped Redis Server
  2. Renamed appendonly.aof to dump.rdb
  3. Edited redis.conf, turned off AOF and turned on RDB again
  4. Started the server again
  5. And the server was able to use the dump.rdb, which had only the header, to recreate the database.

This made me believe that:

  1. The AOF file is a mix of AOF and RDB
  2. BGREWRITEAOF doesn’t actually rewrite the AOF file but takes a snapshot instead

And thanks to Lior Kongo, who answered my question on Stack Overflow, I was able to confirm it.

According to the documentation:

When rewriting the AOF file, Redis is able to use an RDB preamble in the AOF file for faster rewrites and recoveries. When this option is turned on the rewritten AOF file is composed of two different stanzas:

[RDB file][AOF tail]

When loading, Redis recognizes that the AOF file starts with the “REDIS” string and loads the prefixed RDB file, then continues loading the AOF tail.

You can turn it off by setting the configuration on redis.conf:

aof-use-rdb-preamble no

Editing the AOF file

The AOF file is also flexible and easily editable before the BGWRITEAOF command is triggered.

Let’s see if we can flush all our data by running FLUSHALL and then:

  1. Stopping the database
  2. Editing the AOF file and removing the FLUSHALL command
  3. Restarting the database

I will start by adding two keys again:

I can see both of them are in my database:

Now, I’m gonna run de FLUSHALL command:

And now, all of my keys are gone:

You can see all commands are still in my AOF file, so let’s edit it and remove the last one:

And after restarting the server, it’s like the FLUSHALL command never ran:

Should I use AOF or RDB?

If you want to have a degree of durability compared to Postgres SQL or other in-disk databases, you should have both mechanisms of persistence turned on.

AOF will make sure your data is durable and safe while RDB will allow you to keep a smaller file and restart your database faster.

However, it all depends on your use case. If you can tolerate a few minutes of data loss or if you can tolerate data loss at all, then you can either turn off AOF or both AOF and RDB.

Enabling persistence from Docker

If you have been following my previous articles, you have seen I have been doing my experimentations by running Redis Server and Redis Insight from within a Docker Container.

Docker Containers are ephemeral by default, which means that you will lose all your data if you restart your container.

In order to enable persistence for your container, you need to configure a volume. To do it, I first create a local directory at /tmp/local-redis/data. Then, you need to add the following option to your docker run command:

-v /tmp/local-redis/data:/data

And for having a custom redis.conf file loaded into my container, I added the file to /tmp/local-redis/redis.conf and added the following option to my docker command:

-v /tmp/local-redis/redis.conf:/redis-stack.conf

In the end, you will have a command like:

docker run -d --name redis-stack -p 6379:6379 -p 8001:8001 -v /tmp/local-redis/data:/data -v /tmp/local-redis/redis.conf:/redis-stack.conf redis/redis-stack:latest

Which will spin up your container with a persistence volume for keeping your server data stored in disk and another one for injecting the configuration file into the server.

Source:

Contribute

Writing takes time and effort. I love writing and sharing knowledge, but I also have bills to pay. If you like my work, please, consider donating through Buy Me a Coffee: https://www.buymeacoffee.com/RaphaelDeLio

Or by sending me BitCoin: 1HjG7pmghg3Z8RATH4aiUWr156BGafJ6Zw

Follow Me on Social Media

Stay connected and dive deeper into the world of Redis with me! Follow my journey across all major social platforms for exclusive content, tips, and discussions.

Twitter | LinkedIn | YouTube | Instagram

--

--