Setup a DB Backup System in Minutes Using Bitbucket Pipeline’s Scheduler

Shane Fast
BACIC
Published in
4 min readSep 3, 2017
Just get it done!

As for just about every new feature that Atlassian introduces in Bitbucket Pipelines, I had to try it out.

For context, our team was producing daily data backups but wanted to upgrade to hourly backups for a variety of reasons. I had been using a simple cron job to accomplish this task up to this point, but found three main benefits to offloading this onto pipelines:

  1. More tasks are centralized into our code base. It makes it much easier for other people on the team to learn how it works and make changes if needed.
  2. Notification ready for Slack. Since I already had pipeline notifications set up, our team had a pulse on the backup system right off the bat. If anything fails, we know immediately!
  3. Compliance ready. With the history of pipelines stored for your codebase, you can easily grab links and screenshots to show auditors that you are performing periodic backups. Handy to make audits like SOC 2 easier, for example.

Build steps

What follows is an example of a 6-hour rotation of MongoDB backups (everything older than 6 hours is removed).

Before continuing, I assume your team has pipeline notifications and SSH keys set up. Use cases will obviously be varied, especially if you want a longer data persistence solution tied to volume storage (but not a bad place to start if you don’t have any backup system currently).

  1. Create the following custom pipeline:
pipelines:
custom:
create-backup-hourly:
- step:
script:
# SET KEY TO DATABASE NODE
- mkdir -p ~/.ssh
- umask 077
- echo $MY_SSH_KEY | base64 --decode > ~/.ssh/id_rsa
- cat ~/.ssh/id_rsa
# CREATE DATABASE BACKUP
- ssh $USERNAME@$DB_IP "mongodump --username $DB_USERNAME --password $DB_PASSWORD --authenticationDatabase admin --out=/path/to/backup/dump-$(date +%F)-$(date +%H)"
- ssh $USERNAME@$DB_IP "find /path/to/backup/ -maxdepth 1 -type d -mmin +360 -exec rm -r {} \;"

Here we are simply accessing the database node’s command line and running a Mongodump command to create a quick backup of the database. The output folder uses Linux date/time syntax (%F gives the current YYYY-MM-DD, and %H gives the current UTC hour [0–23].

The final command just searches your backup directory and deletes anything older than 6 hours, so don’t leave anything important here (like persistent daily backups).

As a word of caution, always test these commands before running them in a production scenario. A trick that I always use is placing “echo” at the beginning of the ssh command when testing:

- ssh $USERNAME@$DB_IP "echo find /path/to/backup/ -maxdepth 1 -type d -mmin +360 -exec rm -r {} \;"

This will just print out the command that was about to be executed rather than actually executing it.

As a final note, always use double quotes in pipeline ssh commands so that environment variables get properly passed into the command.

2. Set the environment variables for the database IP address, username, database username, and database password in Bitbucket settings (DB_IP, USERNAME, DB_USERNAME, and DB_PASSWORD, respectively)

3. Create a Schedule to run your custom pipeline:

You can tell that I learned the lesson about double quotes the hard way

Generally, each hour the scheduler triggers will only consume about 5 build seconds, using about 60–70 build minutes per month.

It certainly helps me sleep at night

And the final result is a bit of an “Everything is OK!” alarm at first, but it can be adjusted to be less annoying.

Once everything is set up, feel free to check your backup directory to see if your 6-hour history is there. You may occasionally have 7 hours if a pipeline run takes a few extra seconds provisioning a server, but it shouldn’t be an issue.

This solution is not a full measure to ensure backup consistency but is a handy lightweight method to serve as a quick stopgap. It can help avoid losing more data than necessary.

If you like tying in notifications, try out the new scheduler feature for Bitbucket Pipelines, especially for repetitive tasks. Even if a task is automated, you don’t want to spend time checking in manually if you don’t have to.

If you found this valuable or entertaining, please follow the blog, and I’ll continue to post more tech goodness. Thanks for reading!

--

--

Shane Fast
BACIC
Editor for

Interested in building things and building teams.