Backup your data into AWS cloud with UNIX command line tool
As we know, necessity is the mother of all inventions. I needed a CLI tool to backup my local machine’s data into AWS cloud with a single command
We create an ec2-backup tool writing a script in Python. We use modules like subprocess, os, boto3 and argparse.
The ec2-backup tool performs a backup of the given directory into Amazon
Elastic Block Storage (EBS). This is achieved by creating a volume of
the appropriate size, attaching it to an EC2 instance and finally copying
the files from the given directory onto this volume.
ec2-backup will create an instance suitable to perform the backup, attach
the volume in question and then back up the data from the given directory. Afterwards, ec2-backup will terminate the instance it created.
ec2-backup assumes that the user has set up their environment for general
use with the EC2 tools and ssh(1) without any special flags on the com-
mand-line. That is, the user has a suitable section in their ~/.ssh/con-
fig file to ensure that running ‘ssh ec2-instance.amazonaws.com’ suc-
To accomplish this, the user has created an SSH key pair named
‘ec2-backup’ and configured their SSH setup to use that key to connect to
ec2-backup also assumes that the user has set up their ~/.ssh/config file
to access instances in EC2 via ssh(1) without any additional settings.
It does allow the user to add custom flags to the ssh(1) commands it
invokes via the EC2_BACKUP_FLAGS_SSH environment variable.
The tool will create a volume by default with the size double the given directory or user may provide the volume Id with option [ -v ]. We create optional and positional arguments for our tool using argparse in Python
ec2-backup [-h] [-v volume-id] dir
First, we find the availability zone of the provided volume and spin up an ec2-instance in the same availability zone. We find the running state of instance by taking required attributes using boto3.
After the instance’s state turns to running state, volume gets attached to the instance.
#To find az of volume
return subprocess.getoutput('aws ec2 describe-volumes --volume-ids '+volumeid+ ' --query "Volumes[*].[AvailabilityZone]" --output text')#create instance
return 'aws ec2 run-instances --image-id ami-0565af6e282977273 --count 1 --instance-type t1.micro --key-name ec2-backup --output json --placement AvailabilityZone=' + i
#Using boto3 to find the state of instance ( this can also be done
using json.loads())def get_instance_state(EC2InstanceId):
ec2 = boto3.resource('ec2')
ec2instance = ec2.Instance(EC2InstanceId)
x = ec2instance.state['Name']
return x#attach volume
return 'aws ec2 attach-volume --volume-id ' + a +' --instance-id ' + b + ' --device /dev/sdf'
Now, we we find the public dns of instance and ssh into the instance.
While ssh, using ‘tar’ archive the directory and piping it to ‘dd’ command to copy the data directly into the raw volume and by reverse piping we can retrieve the data.
You'd read data from the raw device. If your program emulates this
pipeline:tar cf - dir | ssh instance "dd of=/dev/whatever"then restoring data would be similar to this command:ssh instance "dd if=/dev/whatever" | tar xf -
or we can use ‘scp’ to copy the data into the home directory on instance, create a file system and create a mount point and move it to the place where mount point is created. Now we can detach the volume and can attach to other instance and retrieve the data.
return 'ssh -o StrictHostkeyChecking=no ubuntu@'+dns + ' "sudo mkdir ~/bac ; sudo mkfs.ext4 /dev/xvdf ; sudo mount /dev/xvdf ~/bac/ ; sudo mv ~/backup.tar ~/bac/ ; cd ~/bac/ ; sudo tar xvf backup.tar ; echo $(ls ~/bac)"'
Finally, write a function to terminate the instance after copying the data.
The following examples illustrate common usage of this tool.
To back up the entire file system:
$ ec2-backup /
To create a complete backup of the current working directory using
defaults to the volume with the ID vol-1a2b3c4d, possibly overwriting any
data previously stored there ( use dot (.) ):
$ ec2-backup -v vol-1a2b3c4d .
Summary: That’s it! We have our data uploaded to AWS cloud with a single CLI command. I hope it will be useful :)
Here is the source code for our tool, that I have implemented :