How to Resolve pg_wal Running Out of Disk Space: Steps to Recover Your Downed Database

Kemal Öz
3 min readJun 4, 2024

--

PostgreSQL is one of the most widely used open-source relational database management systems, known for its robustness and feature-rich nature. However, managing a PostgreSQL database effectively involves performing several routine tasks, such as stopping database services, backing up critical data, and cleaning up old log files. In this article, we will explore these tasks through specific commands, providing a comprehensive guide to understanding and executing them.

Stopping the Patroni Service

Patroni is an HA (High Availability) solution for PostgreSQL, enabling automated failover and replication. When performing maintenance tasks, you might need to stop the Patroni service.

systemctl stop patroni #please do this each node
# If you dont use patroni please only stop postgres service

This command stops the Patroni service on the node. You should execute this on both master and standby nodes to ensure the entire cluster is stopped.

Backing Up WAL Files

WAL (Write-Ahead Logging) is a crucial component of PostgreSQL’s durability and consistency mechanisms. WAL files store a record of changes made to the database, ensuring that these changes can be replayed during recovery. It is essential to back up these files regularly.

cp -r /pg_wal /PGS_BACKUP/wall/

The command above copies all files from the /pg_wal directory to /PGS_BACKUP/wall/, creating a backup of the WAL files. Regular backups are vital for preventing data loss and ensuring data integrity.

Retrieving Control Data

To determine the state of the database and identify the latest checkpoint, you can use the pg_controldata command. This information is essential for the next steps in WAL file management.

pg_controldata -D /pg_data/data/

This command provides detailed information about the PostgreSQL control data. From its output, you can find the line specifying the latest checkpoint’s REDO WAL file:

Latest checkpoint's REDO WAL file:    000000110000012C00000002

The 000000110000012C00000002 value represents the WAL file needed for the next recovery.

Cleaning Up WAL Files

Over time, the accumulation of WAL files can consume significant disk space. Cleaning up old WAL files is essential for maintaining disk space and ensuring efficient database operations.

Simulating Cleanup

Before performing the actual cleanup, it’s prudent to simulate the process to see which files would be deleted:

pg_archivecleanup -n /pg_wal 000000110000012C00000002

The -n option indicates a dry run, where the command only lists the files that would be removed, without actually deleting them.

Performing Cleanup

Once you’re sure about the files to be deleted, you can proceed with the actual cleanup:

pg_archivecleanup -d /pg_wal 000000110000012C00000002

The -d option performs the deletion, removing all WAL files older than 000000110000012C00000002 from the /pg_wal directory.

Why These Commands Are Important

  1. Backup and Recovery: WAL files are essential for database recovery in case of failure. Regularly backing up these files ensures you can restore your database to its latest consistent state.
  2. Disk Space Management: Cleaning up old WAL files prevents the disk from filling up, which can cause the database to stop working correctly and lead to potential data loss.
  3. Database Integrity: Ensuring that the database is properly stopped and maintained preserves the integrity and performance of the database.

By following these steps, you can efficiently manage your PostgreSQL database, ensuring it remains reliable and performs well. Regular maintenance tasks like stopping the Patroni service, backing up WAL files, and cleaning up old logs are crucial for any database administrator aiming to maintain a healthy and robust PostgreSQL environment. For more detailed and technical articles like this, keep following our blog on Medium. If you have any questions or need further assistance, feel free to reach out in the comments below and directly.

--

--