Managing PostgreSQL backups and replication for very large databases — Part 2

leboncoin tech
leboncoin tech Blog
8 min readMar 7, 2019

By Flavio Gurgel (Database Administrator)

Back to this subject after the first part of this serie, let’s put some practical examples on how we handle our backups at leboncoin.

Restore tests

Let’s assume that you have defined your backup strategy, frequency, retention policies and backup jobs are in place. Everything seems safe on a disk or tape somewhere.

An incident strikes hard and you just lost a database. You look at your handy backups and one of this may happen to you:

  • the person that wrote the backup job didn’t write a restore job
  • the person that wrote the backup procedure didn’t write a restore procedure
  • the restore job just plain doesn’t work
  • the restore procedure is full of errors
  • all jobs/procedures work but the restored database doesn’t start
  • the restored database doesn’t have the most recent/wanted data
  • the restored database is corrupt
  • you just don’t know where to start and start screaming/crying
  • put your restore nightmare here

I won’t stress much more, everybody needs to test restore. It’s mandatory. It’s good for your health. It can be really life threatening not having tests. Restore tests should use exactly the same automatic job or written procedure that is available in case of incident. These exercises can be long, tedious or expensive. You need a spare server to make your tests.

More then the risk mitigation that you achieve with restore testing (which may be already enough to justify it), you can actually profit from restore tests:

  • restore on staging servers so you can test your ready for production code, schema migrations, etc, on fresh data
  • restore on development servers so developers can work on fresh data
  • test on a data lake staging server so ETL code runs on a fresh copy of your data and don’t impact production servers
  • recreate standby servers (PostgreSQL specific) so you’ll test two procedures/jobs in a single shot

The frequency of restore testing depends on your needs and possibilities, a minimum viable option would be every time you update your database binary versions or change the database schema.

At leboncoin, the most complicated database restore is tested every week. The job is fully automatic. The restored database is used for staging and Quality Assurance purposes.

How dumps are taken at leboncoin

We have a “dump server” setup that can connect to all production databases and is authorized to read them. Daily, the server dumps all transactional databases in a scheduled fashion, encrypts them on the fly so we don’t have clear text data at rest, and the encrypted files are then stored on disk and sent to a cloud provider.

Having dumps on local disks and in the cloud is important because we have fast access do latest dumps (cached on disk) and offsite storage of the oldest (cloud) in the very rare case of major catastrophe like complete destruction of the two geographically separated datacenters.

This dump server is well protected from unauthorized access. This server is also capable to restore individual database dumps anywhere in the infrastructure.

Logical backup retention

Dumps are human-readable text (or may be easily transformed to), they are convenient for late auditing and can be restored even after several years in newest versions of PostgreSQL, or even in another database system with a few manual changes. So dumps are the way to go for long term retention. In leboncoin’s case, it’s defined by GDPR rules.

Every business situation and regulatory needs are different. Some examples:

  • banking transactions may need 5 years of dump retention for transaction auditing
  • car manufacturing safety data may need as many as 15 years of regulatory retention

Discuss your dump retention in your company to define your scenario. Decide you dump frequency and storage needs (disk, tape, optical media, cloud) for the short and long term needs.

How physical backups are taken at leboncoin

Physical backups are made with the barman tool. Physical backups need a lot of disk space but not too much CPU cycles.

To calculate the disk space needed for physical backups, we need to know:

  • the size of the database on disk
  • the number of basebackups we want to have
  • the total size of all WAL files since the earliest basebackup

From all the numbers above, the database size on disk is the easiest to know. The number of basebackups to save is a decision to take with some factors like the estimated time to restore. To calculate this, the factors are based on how much WAL is generated. Databases that are very modified (all write operations, INSERT, DELETE, UPDATE) generate more WAL, and to restore all this WAL PostgreSQL takes some time. Let’s look at some examples:

  • Case 1 — A database of 100 GB that generates 10 GB of WAL per day, would take approximately the same time to restore the basebackup + 10 days of generated WAL.
  • Case 2 — A database of 10 GB that generates 100 GB of WAL per day, would be 10 times slower to restore a day worth of WAL then to restore the basebackup
  • It’s clear that in the case 2, we need to basebackup more often then on the case 1.

With that in mind, at leboncoin, some databases have basebackups taken every day, others once a week, and we even have some very large databases with very few changes per day that are basebackup once every 15 days.

Be aware that it’s possible to basebackup more than once a day in a case of a very small database with a lot of changes. That can greatly reduce you time to restore.

Another interesting thing about physical backups is, since we have two datacenters and data is replicated from one to the other, we need to have backups on both to be geographically safe. Our barman servers work in pairs, one server has a peer at the other datacenter, both receive the complete WAL stream from the same origin (sometimes in a cascaded fashion) but the basebackups are taken in alternance.

A minimal two barman/two database servers strategy for geographically protected production environments — each barman receives the full WAL stream, even if cascaded, but basebackups are taken in alternate schedules to avoid double pressure on production. Note that this architecture can be different depending on network capacity and other production constraints.

Since we have several database servers in production, we have also several barman servers to cope with the disk space needs and CPU time, because while receiving WAL some CPU is consumed by the stream receiver (pg_receivewal under the hood for barman) and compression. Each barman pair is shared amongst a couple of database clusters.

Physical backups retention

Physical backups are very useful, if not only, for disaster recovery. It doesn’t make sense to have several months or years of basebackups plus WAL stream on disk.

Dumps are better and cheaper to store for the long term as we saw above.

Physical backup retention should be discussed as how to recover from a disaster as quickly as possible and how much time on the past we should be able to restore in case of Point in Time Recovery (possible only with physical backups, not dumps).

Some business cases of long term Point in Time Recovery needs exist. Fortunately, these cases are rare. But it’s doable, it only needs lots of disk/tape space to keep all basebackups and the corresponding WAL stream.

Usually, a couple of days or weeks of physical backup retention is enough. It will depend only on your business rules and disk space.

At a minimum, we’d say that two basebackups and corresponding WAL on disk is needed. This way, you can restore your database in case of incident while taking a new basebackup. The maximum will usually be your backup disk limits.

Avoid too slow storage for physical backups like tapes or cloud object storage. They can render your production too much time stopped in case of incident.

Estimating time to basebackup

With physical backups, the time to basebackup is usually a matter of hardware capacity concerning disk speed and network bandwidth, the slower will define the basebackup time:

  • disks capable of 10 MB/s transfer rate (including controller capacity) for a 10 GB database, basebackup will take around 17 minutes to finish.
  • with the same disks, if network bandwidth is limited to approximately 1 MB/s (gross approximation for a 10 Mbps interface card) between the database server and barman server, the time rises to 170 minutes.
  • be aware that faster network cards can saturate disk capacity and severely affect queries running on the server.
  • fast disks can saturate network cards too with different impacts on your production depending on your hardware and setup.
  • a separate network interface card (and corresponding network capacity) for backup purposes can be necessary for 24/7 production sites without off peak hours.
  • the normal production and usage of disks and network will impact the calculated time.

What if a basebackup takes too much time to take?

It happens, deal with it.

We have some databases of several terabytes that will need a couple of days to basebackup. Just plan your disk capacity and network bandwidth for that and you should be ok. PostgreSQL keeps working normally while taking a basebackup.

If a very large database have few modifications and few WAL generated in contrast to the size of the database on disk, you’ll be fine to basebackup in large intervals.

Estimating time to restore

Similar to basebackup, the time to restore depends exactly the same on your hardware capacity, but the total time is higher then the basebackup because we have also to replay the WAL stream.

Caution: the calculation below is less accurate because WAL restoring need some processing and the type of writing is random, use this example to only as as a gross estimation.

  • 100 GB basebackup + 50 GB worth of WAL to restore, considering a 10 MB/s disk bandwidth and similar network capacity
  • This will take 170 minutes to restore the basebackup + 85 minutes to replay the WAL stream
  • Consider that, if your disks are the spinning type, the WAL replay will be between 2 and 4 times slower, consider 340 minutes to replay WAL !
  • WAL decompression on barman side can slow down replay if you have fast SSD storage on the database side, consider turning WAL compression off on barman.

Estimating time for pg_dump or restore from a dump

It’s very difficult, almost impossible, to have this estimated, because it all depends on type of dump (compression ration, parallelism) and database schema (lots of indexes can be fast to dump but very slow to restore).

You have only one chance: do it. Dump and test restore procedures on your real production databases. Try different configurations and benchmark yourself. Every database is different so you’ll only know if you test all of them.

Not finished yet — we need a third part

Why? Because we have to talk about replication. Replication has something to do with backups, because a physical backup is the start point of replication.

Ready? Stay tuned ;)

--

--