Automated postgresql12 backups in k8s using pg_dump and .pgpass
I was tasked with backing up all the databases in our production on premises k8s cluster in the past several months by creating cronjobs which would automatically backup at defined daily intervals. Some databases have proven to be more challenging than others in this context. Postgresql12 happens to be a little different than the others in terms of how it expects credentials to be supplied in an automated context.
First, a little background of the environment I’m operating within.
I’m using Ubuntu 22.04 LTS with three control plane nodes and four workers. There’s a FreeNAS (or TrueNAS if you’re really awesome and current, but any NFS export will do) export that we’ll call: backups00.storage.com. I’m also running k8s 1.26.7 throughout the cluster.
You will want some level of control over DNS (Domain Name Services) if you’re going to use the fully qualified domain name like I demonstrated above and not say an IPV4 address like 10.0.0.52, for instance. DNS is out of the scope of this discussion.
I’m going to also assume if you have a working k8s cluster you have a working kubectl installation for the next steps.
First, let’s create a secret that we’ll mount as the hidden file “.pgpass” in our backup container.
echo "testhostname:5432:testdatabasename:testusername:testdatabasepassword" | base64PostgreSQL: Documentation: 14: 34.16. The Password File
To understand why we’re making that above… you need to understand the requirements for issuing the pg_dump command to your remote postgresql database. Go ahead and read that brief document from the Elephant source 🐘.
The summary of what you need to know for now is your database hostname, the ingress port in which data is going in and out, the name of the database on the host, the username which has access to control of the database, and the password for the aforementioned user. Port 5432 is generally going to be the default that most deployments will use.
Make sure you’re in the correct namespace… it could be different depending on how you’ve setup your postgresql container:
kubectl config set-context --current --namespace=postgresqlNext create a .yaml file and take the contents of the last command (yes yours will be different 😁) and replace the value of .pgpass with whatever your output was:
apiVersion: v1
kind: Secret
metadata:
name: pgpass-secret
type: Opaque
data:
.pgpass: cG9zdGdyZXa6NTQzMjpnaXRlYTpnYXRoOmcxdHBhc3M=Save the file and apply the secret.
kubectl apply -f pgpass_secret.yamlAnother assumption: You should have a postgresql12 database with ingress setup listening for requests on TCP 5432 for this to work in a namespace relative to your deployment. A good example of an application that has this would be deploying something like gitea — it needs a database like postgresql12 to run. Here’s a link to gitea’s backup and restore recommendations.
Here’s the sample job definition:
apiVersion: batch/v1
kind: CronJob
metadata:
name: postgres12-backup
namespace: postgres12
spec:
schedule: "* 0 * * *"
jobTemplate:
spec:
template:
spec:
volumes:
- name: backup-storage
nfs:
server: backups00.storage.com
path: /database_backups
- name: pgpass-volume
secret:
secretName: pgpass-secret
defaultMode: 384
containers:
- name: postgres12-backup
image: ubuntu/postgres:12-20.04_edge
command:
- "/bin/sh"
- "-c"
- "pg_dump
-U testusername
-h testhostname
-p 5432
-d testdatabasename | gzip > /mnt/postgres12/testdatabasename_backup-$(date '+%Y-%m-%d-%H-%M')sql.gz 2> /mnt/postgres12/backup_error.log"
env:
- name: PGPASSFILE
value: /.pgpass
resources:
requests:
cpu: 500m
memory: 1Gi
volumeMounts:
- name: backup-storage
mountPath: /mnt
- name: pgpass-volume
mountPath: /.pgpass
subPath: .pgpass
restartPolicy: OnFailureLet’s break it down:
volumes:
- name: backup-storage
nfs:
server: backups00.storage.com
path: /database_backups
- name: pgpass-volume
secret:
secretName: pgpass-secret
defaultMode: 384The first volumes definition name: backup-storage is pretty self-explanatory, with that specifying we want to mount our NFS export backups00.storage.com and the path /database_backups to be available for mounting.
In this above example, the defaultMode field within the volumes section of the Pod specification sets the permissions for the volume to 384 (equivalent to 0600 in octal notation). The defaultMode value is applied to all files within the Secret volume mount.
containers:
- name: postgres12-backup
image: ubuntu/postgres:12-20.04_edge
command:
- "/bin/sh"
- "-c"
- "pg_dump
-U gitea
-h testhostname
-p 5432
-d testdatabasename | gzip > /mnt/postgres12/postgresdb_backup-$(date '+%Y-%m-%d-%H-%M')sql.gz 2> /mnt/postgres12/backup_error.log"You can use whatever image you want as long as it has psql tools installed. I’ve had varying success with other operating systems. If you want to get fancier with the commands you’re running, take a look at this wiki for backing up postgresql on linux, which I found helpful for context on how to format the actual pg_dump command. Even though .pgpass defines the user, database, host, and port, we still have to specify it in our command. I’m passing the standard output and compressing it with gzip to a directory I made on my mount named postgres12. The rest of the command is a mixture of bash expansion with the date command and error output logging.
env:
- name: PGPASSFILE
value: /.pgpass
resources:
requests:
cpu: 500m
memory: 1Gi
volumeMounts:
- name: backup-storage
mountPath: /mnt
- name: pgpass-volume
mountPath: /.pgpass
subPath: .pgpassThe env section actually sets the environment variable talked about earlier in the documentation.
volumeMounts connects our NFS export to our mount path at /mnt so that our pg_dump command can output successfully.
Finally, pgpass-volume connects the pgpass-secret to an actual file named .pgpass as defined by the mountPath and the subPath.
Setting the defaultMode and the octal notation took a while to figure out for me personally… but I’m just beginning my kubernetes voyages so it’s not surprising that I don’t know what I don’t know.
Troubleshooting:
Good luck. I’m sadly not available to help you with this much further friend.
If you’re really lost and really want to get going on this, my advice is to start at the basics and take the command:
pg_dump -U testusername -h postgres -p 5432 -d | gzip > postgresdb_backup-$(date '+%Y-%m-%d-%H-%M')sql.gzSee if you can issue the command locally from an interactive Ubuntu container running in the same namespace. You’ll be prompted for a password interactively in this method. If this works and makes a useable backup, add another layer like your output storage, and see if you can back it up directly there instead of on a locally attached disk.
Then, move on to attempting to use .pgpass (don’t forget about the 0600 permissions and the environment variable defining its location!), and then secret creation and so forth, until you work your way up to letting the cron job run without interaction.
Oftentimes, you’ll need to troubleshoot the exact name of the database name in the ENV for the database container in which case, SSH into the database container and run the command env to see what the variables are set to for things such as HOST and INGRESS_PORT and the USER for the database. There is usually a variable for the database password as well.
kubectl exec -it postgresql12podname -- /bin/sh
That above command with your supplied pod name will get you an interactive shell on your database pod so you can see if you can get some more information to get started.
