Creating backup and restoring PostgreSQL on EC2 Server using AWS S3

Rupesh Desai
Amazon Web Services
4 min readOct 17, 2021
PostgreSQL EC2 Backup

One of the reasons you really want an RDS instance is the reliability of the instance and automatic snapshots of the data for you to backup and restore at the convenience of your world.

There are specific utilities that PostgreSQL has built in to do backups. In this article, we will review different types of backups you can take with PostgreSQL.

Types of Data Backups using PostgreSQL

There are the following types of backup that you can plan and take using PostgreSQL

  1. SQL Dump(Used for specific reasons): The SQL dump is a process of creating a SQL text file output that when run again on the server, shall create the same state of the database structure and its record as it was at the time of the dump. The SQL dump method is common to take a full database backup and restore. This is usually used for not only backup of data, but also for moving data from the server.
  • Depending on the size of the database, the SQL dump may take a significant amount of time for processing the backup as well as for restoring the same.
  • SQL dump can target one or all databases that you need to backup.
  • Ideally, you should use SQL dump in specific intervals to ensure that you have a dataset copy for restoring a particular point in time.
  • Note, there is a gzip method now available to manage large datasets for SQL dump.

2. File System Level Backup (Not Recommended): The file system level backup goes to the root folder where the data is saved for PostgreSQL and creates a backup of the same.

Creating a backup of the file system simply isn’t as practical as the SQL dump or the other methods as mentioned below because of the following reasons.

  • You need to shut down the service to make a copy of the backup, which in most cases isn’t practical as it requires a significant amount of downtime.
  • The file system is complex and if you tried to restore only certain aspects of the database, then it might not work, or simply fail. The long answer

“If you have dug into the details of the file system layout of the database, you might be tempted to try to back up or restore only certain individual tables or databases from their respective files or directories. This will not work because the information contained in these files is not usable without the commit log files, pg_clog/*, which contain the commit status of all transactions. A table file is only usable with this information. Of course it is also impossible to restore only a table and the associated pg_clog data because that would render all other tables in the database cluster useless. So file system backups only work for complete backup and restoration of an entire database cluster.” — postgreSQL docs

3. Continuous Archiving and Point in Time Recovery: PostgreSQL maintains continuous WAL (Write ahead log) files and a combination of file system backup with WAL is the best way to create point-in-time recovery backup.

  • We do not need a perfectly consistent file system backup as the starting point. Any internal inconsistency in the backup will be corrected by log replay (this is not significantly different from what happens during crash recovery). So we do not need a file system snapshot capability, just tar or a similar archiving tool.
  • Since we can combine an indefinitely long sequence of WAL files for replay, continuous backup can be achieved simply by continuing to archive the WAL files. This is particularly valuable for large databases, where it might not be convenient to take a full backup frequently.
  • It is not necessary to replay the WAL entries all the way to the end. We could stop the replay at any point and have a consistent snapshot of the database as it was at that time. Thus, this technique supports point-in-time recovery: it is possible to restore the database to its state at any time since your base backup was taken.

If we continuously feed the series of WAL files to another machine that has been loaded with the same base backup file, we have a warm standby system: at any point, we can bring up the second machine and it will have a nearly-current copy of the database.

Additional other third-party backup methods

  • BarmanS3 leverages the AWS CLI tool and takes backup of your PostgreSQL instance using the WAL method. It has great documentation and is very easy to set up.
  • pgBackRest is a repository maintained by an open-source community that focuses on backing up PostgreSQL. These guys also have an S3 connector that directly saves the files to S3.
  • pghoard is an open-source community-maintained repo for backing up PostgreSQL data. Check out the resources as it has a multi-part upload of backed-up data on S3, creating faster file transfer.

Summary

A generic suggestion on which type of backup reaches its end of the line, as choosing the backup type, frequency, and storage options are something that needs to be tailored to specific needs. There is not a single answer that fits all. You should evaluate the transactions that occur during the day/week/month. How critical is your data? Do you really need incremental backups? There are many other questions that you need to think about and answer before moving forward.

Let me know what else you would like to know? Hope you like this article.

--

--

Rupesh Desai
Amazon Web Services

Blogger, Entrepreneur, Data Analyst, Developer, Designer, Linear Creative, FitnessFan