My Backup Strategy
Losing access to important data is an infuriating experience.
Only today, I spent five minutes writing a small C program, only to compile it and overwrite the source file with the new executable:
gcc solution.c -o solution.c
While re-writing the code wasn’t a hassle, losing other data would be more painful; among my digital hoard, I have electronic copies of contracts, precious family photos and many thousands of lines of source code, all of which would be very painful to lose.
A few months ago, I started to take backing up my data more seriously; at the time, losing my laptop would have resulted in a data apocalypse, which would be far worse than just a ~£250 bill for a replacement.
After some thought, I came up with three requirements for my ideal backup system:
- Cheap. My budget is <$5 per month.
- Offsite & online. Any possibility that all copies of the data could be destroyed simultaneously should be ruled out, and I should be able to update and access the backed up data from anywhere.
- Secure. I need to be able to back up my sensitive or private data.
- Selective. I don’t want to back up everything; lots of stuff on my computer is ephemeral (temporary files, random experiments, whimsical screenshots etc), so I only want to back up the important bits, and not worry about the rest.
Using a combination of Git and Tarsnap, I think my current solution satisfies all of these requirements; source code is universally kept inside some git repository or other, and pushed to a remove service such as Bitbucket or Github, and everything else is backed up to Tarsnap.

Tarsnap is marketed as a secure, efficient online backup service. Quirks such as having only a command line client, or requiring the client be built from source mean it’s definitely not an ‘easy to use’ product. It does however, have a number of significant advantages:
- It’s very cheap; it only costs $0.25/month to store 1GB of data. Since data is compressed and de-duplicated before being stored, the size of the backup is often surprisingly small, giving rise to a similarly small bill.
- It’s secure. All data is encrypted with key files, meaning that only me (or somebody with the encryption keys) can access the stored data. The client is open source, written by a well known and respected computer scientist, heavily peer reviewed and with a bug bounty program too.
- It’s both offsite (all data is stored in Amazon S3) and selective (I upload files and folders manually).
The only downside of tarsnap is that it generates keys for you, that are then used to encrypt and decrypt the stored data. Managing the keys then becomes a liability, since if an attacker gets hold of them, then they have unfettered access to all of my data, and if I ever lose the keys, then I’m locked out of everything!
Of course, the solution is to encrypt and back up the encryption keys themselves. I created separate keys for reading and writing to Tarsnap using the following commands:
tarsnap-keymgmt --outkeyfile write.key -w --passphrased --passphrase-mem 2G --passphrase-time 200 master.key
tarsnap-keymgmt --outkeyfile read.key -r --passphrased --passphrase-mem 2G --passphrase-time 200 master.key
These are encrypted with a long password (as suggested by jabberwock) and sit on each of my laptops. As for the master encryption key, I used the following commands to split it into multiple QR codes that I then printed and stored in a secure location:
split -b 1000 master.key # Produces xaa, xab, xac...
for i in `ls ../x*`; do cat $i | qrencode -o $i.png; done
Having distinct keys for reading and writing is good because it prevents command line accidents (such as accidentally deleting an archive by specifying the wrong tar flag).
Finally, the whole caboodle is very good in terms of cost; I spend less than $0.30 per month to back up my family photos and some important documents, and all of my source code is backed up for free using Git remotes.