Recently, I’ve migrated my personal backup from Backblaze Backup to B2 (an online S3-style file storage, also by Backblaze).
I had been using Backblaze backup for about four years and while the service and the client is excellent, allowing anyone to easily setup backups, I think it’s time to move to a solution I own more control of for the following reasons:
- Encryption. Security is a very hot topic nowadays and you really need to trust your backup software. By default, Backblaze uploads nearly all files in my user folder, which includes my browser cookies. Fallen in the wrong hand, it could be used to impersonate my account in many websites. I want a backup system where, by cryptographic design, the backup provider cannot access and decrypt my data. By default, Backblaze uses an encryption key provided by them which means that they can decrypt all data. They also provide an alternative option where I can specify my own key, but last I’ve checked, I will need to provide them the key when I attempt a restore, which is a no-no. (When I do a restore, the encrypted data should be downloaded on my computer and decrypted locally, not decrypted on their system.)
- Pricing. Backblaze charges $5 per month for unlimited backup (down to $4 with a 2-years subscription), which was a very good deal back then. However, storage technology has improved but my data size had not increased that much. I only use around 70 GB of backup, which with B2’s pay-per-use pricing, I will only need to pay $0.35 per month for storage and $1.4 each time I want to download my backup.
Searching For a New Solution
The problem with switching to B2 is that, I will need an alternative backup client which will run on my computer and upload to B2. The requirements include:
- Encryption. As mentioned before, my backups need to be encrypted.
- Forever Incremental. To save my bandwidth and storage, only changed parts of files should be uploaded. It should not be like a traditional design where you do one full backup, then incremental backups based on the full backup, but then you can never delete the full backup or incremental backups since they all depend on each other. The solution should be able to prune old backups as needed without needing to re-upload everything again.
- Designed for Personal Backup. The client should be usable on personal computers which is not turned on all the time. The client should intelligently choose the time to backup (e.g. when not on battery power) and run any skipped backup. Having a good GUI is a plus.
- Natively Support Cloud Storage. While this might have been obvious at first, I came across some great backup software which do not support cloud storage at all (only local files or network mounts). Backing up locally then sync’ing the changes later is not acceptable as I do not have the disk space to store everything twice. Using FUSE (mounting cloud storage as local drive) is also not acceptable because we need to avoid downloading from cloud storage as much as possible (since it costs money). Developer-oriented cloud storage also has nifty features such as checksum verification for uploads to make sure that uploads were not corrupted during transit, which is the only guarantee we can have that the backup was done correctly since we will not routinely download the data to check.
- Open Backup Format. Open-source software is preferred since I can always fix something if it is broken, but failing that, the software should at least have an open backup format so in the worst case, I can write my own software to decrypt and restore the data.
I’ve heard of Arq Backup for a few years now but had not tried it yet. Being a native macOS app, it has a very nice UI and meets all the requirements mentioned above. While some people on Reddit mentioned some reliability issues, I haven’t seen them while testing and so I hope the issues were fixed.
The interesting thing to mention here is how “incremental backup” in implemented. The website mentioned “rsync-style rolling checksum”, but I do not think the term is accurate. The magic behind it is data-dependent variable-size block splitting, which is best described by tarsnap’s presentation.
How Arq (and similar softwares) work is that each file is split into blocks and each backup is an index pointing to the blocks. Each time a backup is performed, only new blocks need to be uploaded. (In other words, a full backup is made every time, but de-duplication techniques are used to reduce the data size.) Pruning old backup can be done simply by finding unused blocks and deleting them.
Arq Backup is a little bit pricy at $49.99 (and another $29.99 for lifetime upgrades). However, factoring in the cost saving, I can get a pretty good 2-years ROI for switching.
Cost of 2 years of Backblaze Backup: $95
Cost of new solution:
- Arq Backup: $49.99
- Lifetime Upgrades: $29.99
- 2 years of storing 100 GB of data: $12 ($0.005/GB/month)
- Assuming I download the backup once: $2 ($0.02/GB)
- Total: $93.98
For the record, I am writing down other softwares I have considered and why I did not use them. Note they are all server-oriented and need to be scheduled using cron or similar software. All of them also support encryption.
A fork of attic, which was known as the holy grail of backups. It supports compression, block-based incremental and is open-source. However, the only remote backup it supports is SSH. Rsync.net provides an attic-specific package for $0.03/GB/month, which is considerably higher than B2’s $0.005/GB/month.
Restic supports block-based incremental and cloud storage, but does not support compression for its backup. It’s developers also seem to be very concerned (paranoid-level) with cryptography. While it’s not necessarily a bad thing, it causes new features to be developed slowly (e.g. compression) because of all the cryptographic analysis for levels of privacy I do not require. (I want people to not be able to read my backups, I don’t really care if they know I might have a specific file or not.)
While compression might not be very useful for desktop backups (most of my files are pictures), it can be very useful for server backups where you’re backing up a lot of text data.
Tarsnap supports block-based incremental and compression. However, it is not free software and you must use it with Tarsnap’s storage which costs a whooping $0.25/GB/month.
HashBackup does block-based incremental, compression and cloud storage. However, it is not open-source and the pricing has not been announced yet, making its future uncertain.
Duplicacy block-based incremental, compression and cloud storage. The CLI version’s source code is available and is free for personal use. However, it is not open-source and there are charges for commercial use.
The GUI has a personal license fee of $20 for the first year and $5 for each following year. However, renewal is not automatic and you’ll (probably) have to pay the full license fee again if you forgot to renew. The GUI also seems minimal and is limited to setting up cron scheduling, rather than the system integration Arq Backup provides.
An honorable mention since it’s the one I currently use to backup my server. It supports various cloud storage and compression. However it only supports traditional full-incremental backups. After knowing about the possible alternatives, I’m tempted to switch it over to restic (with hacked in compression) or Duplicacy at one point.