DIY Encrypted Cloud Backup Using Raspberry Pi
--
The floods in Chennai washed away many people’s books and photos. Those stories finally motivated me to get serious about an off-site digital backup solution.
I keep two copies of my photos on two external hard disks, manually syncing them every few weeks. However, since both disks are in the same place (my house), I’m not fully covered in the event of a natural disaster.
What I need is another copy of my data far away from house to protect against flooding etc. Also known as geographic redundancy or off-site backup.
One way to solve this problem is to outsource it to some company. But what’s the fun in that?
Overview
I use a pair of geographically distant Raspberry Pi’s, one at my house, and one at my brother’s house. I’ll call the Pi at my house the on-site or local one and the one at my brother’s house the off-site or remote. Each has a USB external drive attached to it. I use Bittorrent Sync to keep these disks synchronized. That’s the gist.
There are a few nice things my setup has on top of the basics. Such as:
- Privacy: only encrypted versions of my files are present on the off-site Pi.
- Expandability: you can easily add another disk down the road to store more files.
- Safety: Bittorrent Sync keeps revisions of old files, so you have some protection accidental deletion. To make recovery from such errors easier, I wrote BTSync Rewind, an open-source software tool that presents a Time Machine-like point-in-time snapshot view of your Bittorrent Sync repository.
Those with some Linux experience should be able to replicate my setup with the instructions below. I gloss over the standard steps such as installing the Raspberry Pi, setting up ssh remote access, and getting Bittorrent Sync going.
Disclaimer: I tried everything, and made every effort to ensure that the commands and advice below is correct. I provide the reasoning behind each step so that you can make an informed decision. But ultimately you read and follow the procedures below at your own risk. I am not responsible for any losses you suffer by following these instructions.
My Hardware and Software
- Edimax Wifi USB adapter (recommended)
- Plugable 4-port powered USB hub (you may not need this)
- First-generation Raspberry Pi model B (other models should also work)
- SD card, external USB disk drive, shoebox to keep things.
- Raspbian GNU/Linux
- Bittorrent Sync
Watch the Watts aka Wiring for Reliability
The first-generation Raspberry Pi that I have can’t supply a lot of power to its USB ports. It freezes when I attach the wifi adapter or external drive directly.
I attach the wifi adapter and hard disk to a powered USB hub that I connect to the Pi. I use one port from the hub to serve as the Pi’s power supply.
Connecting the Wifi adapter and hard disk to the same USB hub lowers performance disk and wifi performance, but it’s fast enough for me and has been super-stable.
The Raspberry Pi B+ and Pi 2 can supply significantly more power. If you have those, can avoid the powered hub. Make sure you get a nice big power supply like the iPad one (2A or more at 5V). Good explanation of the improved power supply and discussion on maxing out the ports.
Installation
Raspbian
I followed the stock instructions: https://www.raspberrypi.org/downloads/raspbian/
Also install the following packages. We choose the simpler ‘msmtp-mta’ package rather than the default ‘postfix’ that Raspbian chooses.
sudo apt-get install mdadm lvm2 encfs udisks msmtp-mta
Enabling Wifi
This was problematic in the past but all drivers are built-in now, so it should be breeze: http://www.savagehomeautomation.com/projects/raspberry-pi-installing-the-edimax-ew-7811un-usb-wifi-adapte.html
Installing Bittorrent Sync
Go to https://www.getsync.com/platforms/desktop and get the ARM version. Install this on your Raspberry Pi. Try syncing a few files to make sure it’s working first.
Depending upon your home firewall setup, you may need to open up the TCP port at one of the ends. Looks under preferences in the web UI.
Update /etc/rc.local to start Bittorrent Sync on startup. Remember to run at user ‘pi’ (prefix the command with ‘sudo -u pi’).
Creating Expandable Disk Architecture
Next we’ll lay the foundations of a highly-expandable disk architecture. We’ll start with just one disk, but we’ll be able to easily add more disks, and easily replace the disk when it starts wearing out. I recommend doing this.
We’ll use Linux LVM (Logical Volume Manger) to allow us add disks later. We’ll use the RAID mirroring system in a slightly unconventional way to make it easy to replace disks.
Linux RAID mirroring has the cool ability to create a create a mirrored RAID array with some disks “missing” at creation time. Such an array starts out as a “degraded” array. Although it sounds useless, this will come in handy later.
We create a mirrored RAID array consisting of that disk and a “missing” disk. We’ll then format the RAID array as an LVM “physical volume”. Multiple physical volumes can be tied together into a “volume group” that is as big as the total space in all the physical volumes it contains. Finally, we can assign space, called “logical volumes”, from volume groups and create file systems on logical volumes.
Creating RAID Array
RAID can work on raw disks without any partitioning, so we won’t bother making partitions, saving ourselves the hassles GPT vs. MBR, large partitions, primary vs. secondary etc. We’ll assume that an external disk exists at address /dev/sda in our examples.
Notes: You need to type the word “missing” exactly as given in the command below. ‘sudo lsblk’ is your friend if you want to confirm device address.
sudo mdadm --create --verbose /dev/md0 --level=mirror --raid-devices=2 /dev/sda missing
You’ll get a question about whether you really want to use the entire disk instead of a partition. Answer yes.
mdadm: Note: this array has metadata at the start and
may not be suitable as a boot device. If you plan to
store ‘/boot’ on this device please ensure that
your boot-loader understands md/v1.x metadata, or use
--metadata=0.90
mdadm: size set to <somenumber>
Continue creating array? y <---- This is your answer.
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md0 started <---- This is what you want to see.
Setting up LVM
These are the four commands:
sudo pvcreate /dev/md0
sudo vgcreate datavg /dev/md0
sudo lvcreate --extents '+100%FREE' --name datalv datavg
sudo mkfs.ext4 -L datalv -m 0 /dev/datavg/datalv
Here’s what the commands do. First, formatting /dev/md0 as a LVM physical volume:
sudo pvcreate /dev/md0
You should see:
Physical volume “/dev/md0” successfully created
Setting up a trivial volume group named “datavg” with only the /dev/md0 physical volume:
sudo vgcreate datavg /dev/md0
You should see:
/proc/devices: No entry for device-mapper found
Volume group “datavg” successfully created
Now create a logical volume named ‘datalv’ to which we give all the space in the volume group:
sudo lvcreate --extents '+100%FREE' --name datalv datavg
Creating a file system to store files. We set a label (-L option) so that this disk can be automatically mounted at a fixed location, /media/datalv.
sudo mkfs.ext4 -L datalv -m 0 /dev/datavg/datalv
You should see something like:
mke2fs 1.42.12 (29-Aug-2014)
Discarding device blocks: done
Creating filesystem with nnnnnn 1k blocks and kkkkkkk inodes
Filesystem UUID: xxxxxxxx-xxxx-xxxx–dddd-aaabbbcccddd
Superblock backups stored on blocks:
<some numbers>
Allocating group tables: done
Writing inode tables: done
Creating journal (8192 blocks): done
Writing superblocks and filesystem accounting information: done
Mounting LVM Logical Volume on Startup (Recommended Way)
Add the following file to /etc/rc.local before the line that starts Bittorrent Sync. The chown command sets the default Raspbian user as the owner of the data disk.
sudo udisks --mount /dev/datavg/datalv && chown pi:pi /media/datalv
Now the logical volume will show up at /media/datalv on startup.
Mounting LVM Logical Volume on Startup (Old School Way)
Create the folder where the file system we created above will appear:
sudo mkdir /media/datalv
Add the following line to the file /etc/fstab (the number of spaces between each word doesn’t matter. I like to use 1 space). The ‘nobootwait’ option make sure the Pi continues booting even if the disk array can’t be activated. Without this, it will endlessly wait at a terminal prompt asking what to do, preventing networking from starting. You’ll need networking to debug this remotely.
/dev/datavg/datalv /media/datalv ext4 defaults,nobootwait 0 0
Add this line to /etc/rc.local somewhere near the top to give ownership of the disk to default Raspbian user:
chown pi:pi /media/datalv
Creating Sync Folder
Finally let’s create a folder within /media/datalv where we’ll put the files to be sync’d:
sudo mount /media/datalv
sudo mkdir /media/datalv/btsync-src
sudo chown pi:pi /media/datalv/btsync-src
Reboot to make sure everything works:
sudo shutdown -r now
Adding a Second Disk (Future Reference)
You’ve leveled up and started shooting 4K video in 3D. In just two months, you’ve almost filled up your first disk. Congratulations! Time to add another disk.
First attach the second disk to the powered USB hub. Doesn’t matter what the capacity is, it’s your choice. The basic idea is to create a second RAID array of two disks, with one disk “missing”. Then format it as a LVM physical volume. Then add it to the volume group. Finally, resize the logical volume and expand the file system.
If your second disk was assigned the device address /dev/sdb.
sudo mdadm --create --verbose /dev/md1 --level=mirror --raid-devices=2 /dev/sdb missing
sudo vgextend datavg /dev/md1
sudo lvresize --resizefs --extents '+100%FREE' datavg/datalv
Replacing a Disk (Future Reference)
It’s been three years and everything has been working well so far. However, you received an email alert from your Raspberry Pi that one of the disks, /dev/sda, is nearing its end of life according to SMART data. Time to replace it (coming next: article for how to set up alerting). The replacement drive needs to be at least the size of the failing disk.
This is when the the fake RAID array is going to pay off. And oh, drives have gotten bigger and cheaper, so you going to replace the 1TB drive with a 4TB one. Don’t worry, you won’t be stuck with 3TB of wasted space. :)
Note: Irritatingly, although you created RAID device named /dev/md0, it might been renamed to something else after a reboot. There are ways to force the name to persist, but I personally like to force as few settings as possible. Use ‘cat /proc/mdstat | grep sda’ to find out which RAID array contains the failing drive, /dev/sda. Let’s say the new name is /dev/md126.
Now plug in the new drive into the powered USB hub. Use ‘sudo lsblk’ to find out what the drive’s device address is. We’ll assume it got /dev/sdc. With this drive too, we’re not going to bother with partitioning the drive etc.
Add the new drive to the array containing the failing drive:
# you will change the device addresses.
sudo mdadm --add /dev/md126 /dev/sdc
You should see a brief message:
mdadm: added /dev/sdc
The Pi will immediately start copying data to the new drive. Doing ‘cat /proc/mdstat’ will show the progress:
$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md126 : active raid1 sdc[2] sda[0]
8384448 blocks super 1.2 [2/1] [U_]
[==>………………] recovery = 14.3% (1200000/8384448) finish=0.4min speed=240000K/sec
Wait for recovery to finish. It may take a few hours. Time to unplug the failing drive from the hub. Mark the drive as failing and then tell the Pi ignore it:
sudo mdadm --fail /dev/md126 /dev/sda
sudo mdadm --remove /dev/md126 /dev/sda
sync
After both commands finish, you can unplug the failing drive from the USB hub. Double check that you’re pulling out the correct disk. It’s highly likely you’ll suffer extreme data loss if you pull the wrong disk (the ‘sync’ at the end tries to reduce damage if you pull the wrong disk). If possible, power down the Pi to ensure you don’t unplug a drive that’s being used.
You’re still only using 1TB of the 4TB on the new disk. Time to use all the space on the drive.
First we expand the RAID array to use the entire disk, and then expand the LVM physical volume to fill the RAID array. Finally, we resize the logical volume and file system to use the newly created free space (same command that we used when we added a new disk above).
sudo mdadm --grow /dev/md126 --size=max
sudo pvresize /dev/md126 # automatically detects size
sudo lvresize --resizefs --extents '+100%FREE' datavg/datalv
All done.
Coming next: Setting up encryption so that the remote side only has encrypted data.