Automated Azure Backup of Polkadot and Kusama Chain Database

Jim Farley
CertHum
Published in
6 min readMay 8, 2021

If you’ve deployed a validator on Kusama or Polkadot, you’ll be aware that the initial syncing of the database can take some time. One way to speed up this initial sync is to import a backup copy of the database and run the validator as a full node (pruned).

MIDL.dev maintains publicly accessible copies of chain databases for both Polkadot and Kusama on their polkashots.io site, which is a great resource for getting started.

However, at some point a validator operator should implement backups of the databases on their own infrastructure. This will ensure that the operator always has access to a copy of the databases and will not be dependent on a third party.

What follows is how CertHum has automated the backup process, storing copies on low cost Azure Blob object storage.

If you don’t have an Azure account, it’s easy to set one up and usually you will receive a credit of $200 for use in the first month. There are plenty of guides online for creating an account, and make sure to pay extra attention on configuration of security aspects for the account, like setting up MFA.

Once you have your account, and a subscription to deploy resources, set up a new Storage Account in the subscription where you will store your backups.

When setting up the Storage Account, I used Standard Performance, and Locally Redundant Storage (LRS). The latter will make three copies of the data within the same Azure zone.

If that zone fails, you will lose your data. I don’t see the need for Zone Redundant or Geo-Redundant Storage because in the event of a problem, it’s easy enough to set up another backup at a different Azure zone or region, so it’s not worth the additional cost.

On the Advanced settings, make sure to disable Blob public access and change the access tier to ‘Cool’ (you likely won’t be retrieving the data often, so no use in paying more for the ‘Hot’ tier).

In Network settings, change the access to ‘Public endpoint (selected networks)’ to further restrict access to the storage. Also configure ‘Internet routing’ — there’s no need to pay extra for Microsoft network routing in this case. Finally, on the data protection settings, I disabled the soft delete options.

You can see a summary of my settings in the following picture:

Azure Blob Settings

Once you’ve set that up, click where it says ‘Selected networks’, or where it says ‘Networking’ under the ‘Security + networking’ section on the configuration pane to the left.

Here, you want to add the IP addresses of the validator nodes which will be writing the backups, and which will be downloading copies of the database. Also make sure to select the IP address of your local desktop. These settings will permit network access to the storage from the IPs you just configured.

Next, create a container where you will store your backups.

Navigate to the ‘Data storage’ section of the left configuration pane and select ‘Containers’. Create a new container which will require you to enter a unique name. Once it’s created, you will need to create the shared access signature. This will be used by your nodes to connect to the container.

Right click on your new container and select ‘Generate SAS’ and set the expiration data far into the future. If you’ve restricted access by network as described above, you can leave the ‘Allowed IP Addresses’ setting empty. You can’t change an SAS later, so this ensures you can modify network access via the permitted networks rather than on the SAS when you want to permit access from new IP addresses.

Finally, make sure you keep only https access enabled and generate you SAS token and URL.

CAUTION: Your URI and SAS token are unique and provide access to your resources. Copy these. You should keep them secure and not share them similar to your other secrets such as private SSH keys.

Now that you’ve got your Blob set up, you’ll need to configure your validator node which will be pushing out the backups.

Log into your server (this setup uses Ubuntu 18.04) and download and extract the Azure executable for connecting to Azure storage, AzCopy.

cd ~
wget -O azcopy_v10.tar.gz https://aka.ms/downloadazcopy-v10-linux && tar -xf azcopy_v10.tar.gz --strip-components=1
sudo rm NOTICE.txt azcopy_v10.tar.gz
sudo chmod a+rwx azcopy
sudo mv azcopy /usr/local/bin/

You can find more info on AzCopy here.

Next, make a backup directory to store your backup:

mkdir ~/backup

Then, create a bash script which will automate the tasks of backing up your database. This example uses a Kusama database.

Caution: This script will stop your validator service. I only use this on my backup validator node, and be sure to disable the cron job if you bring your backup node into production.

sudo nano /usr/local/bin/db-backup.sh

Update the following config with your info and paste it into your script:

#!/bin/bash
####################################
#
# Backup RocksDB
#
####################################
#Chain
chain="kusama"
chaindb="ksmcc3"
#user for backup target directory - could create new backup user
user="<YOUR BACKUP USER>"
# Azure BLOB target URI.
azure_uri='<YOUR AZURE URI __ KEEP IT SECRET, KEEP IT SAFE>'
# Source RocksDB Directory.
backup_files="/home/polkadot/.local/share/polkadot/chains/$chaindb/db"
# Target is our Home Directory.
dest="/home/$user/backup"
# Target filename.
archive_file="$chain-backup.tar.bz2"
#Stop polkadot.service
systemctl stop polkadot.service
# Print stop status message.
echo "Polkadot service stopped"
# Delete last backup
rm $dest/$archive_file
# Print status message.
echo "Deleted prior backup"
# Print status message.
echo "Backing up $backup_files to $dest/$archive_file"
date
echo
# Backup the files using tar.
tar -cvaf $dest/$archive_file $backup_files
# Print end status message.
echo
echo "Backup finished"
date
#Start polkadot.service
systemctl start polkadot.service
# Print start status message.
echo "Polkadot service started"
# Print upload status message.
echo "Uploading to Azure BLOB"
#Sent to BLOB
azcopy copy "$dest/$archive_file" "$azure_uri"
# Print completion message.
echo "Upload Complete"

Change the permissions of you script, and give it a test run.

sudo chmod u+x /usr/local/bin/db-backup.sh
sudo /usr/local/bin/db-backup.sh

If all worked well, you should see your backup archiving, and then uploading to your Azure Blob container.

Once the script is done executing, you’ll see a copy of the backup in your backup directory on your server. You should also see, via the Azure portal in the container you previously created, the file you just uploaded.

Database backup showing in the Blob container

At this point, I would run through the whole process of downloading and restoring the database with the following steps:

1 — Stop polkadot service

2 — Delete the tar.bz file you just created with tar

2 — Purge the live database (or delete from the database location under the polkadot user)

3 — Download the backup copy you just made using AzCopy (reverse the target and destination locations in the azcopy comand

4 — Uncompress the file to target location, e.g.:

sudo -H -u polkadot tar -xvf kusama-backup.tar.bz2 -C /home/polkadot/.local/share/polkadot/chains/ksmcc3/db --strip-components=8

5 — Start the polkadot service and confirm the database was imported

Finally, let’s set up a cron job to run every week, or whatever frequency you want. Add the following lines to your crontab (note the first line — as this will be run as root, make sure the path to the script is known):

#Add path to AzCopy
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games:/usr/local/games:/snap/bin
#Run RocksDB backup job once per week
15 15 * * 6 bash /usr/local/bin/db-backup.sh

Now you have a weekly backup writing to Azure Blob at low cost, and you know you’ll always have one available to restore in a pinch.

There are other additions you can make to this such as setting up a monitor to ensure it runs every week, and creating a script to restore the backup, but these are out of scope for this story.

What changes would you make to this deployment? I would love to hear about improvements in the comments — feedback is always welcome.

--

--