Lightweight (MariaDB) Database Backup to OpenStack Swift Containers

Bogdan Enache
METRO SYSTEMS Romania
10 min readNov 28, 2018

During my job I was involved in the design, deployment and operation of different IT systems. Most of these systems had one thing in common: they held persistent mission-critical data, on which the customer’s business depends. Thus this data needs to be protected.

The bulk of this persistent data is stored in databases. So it became increasingly obvious that a custom backup solution is needed for the various databases.

This solution had to fulfill certain well-set criteria:

  • reliable, scalable and fast storage medium
  • lightweight app — very small disk and memory footprint, and low CPU usage
  • very few extra dependencies
  • easily configurable
  • runs on Linux
  • extensible — can easily accommodate new database or OS flavors
  • no-nonsense

Due to the nature of most of our current applications, that run on OpenStack, we have dedicated tenants with available storage, so Swift was considered as a possible storage medium, among other possibilities.

“But what is OpenStack Swift?” you might ask. Well, as its documentation says ¹:

“Swift is a highly available, distributed, eventually consistent object/blob store. Organizations can use Swift to store lots of data efficiently, safely, and cheaply.“

And indeed, for the OpenStack solutions we see implemented, this is true — they use Ceph ² ³ as a “storage backend” ⁴. So it pretty much filled our bill for the storage medium we needed.

There is just one catch to consider: it’s an object/blob store, not a file store. So keep in mind that most file attributes are lost when uploading.

For the software app itself, a solution using Linux shell scripting was chosen, having the native database backup tool at its core and using the Python Swift client for storage access.

In this article we will analyze specifically backups for MariaDB databases, using mysqldump.

Design

Using the KISS principle, a backup solution that has the following architecture was designed:

Fig 01 — Backup solution architecture
  • a list of databases and their credentials will be kept in a CSV-like file;
  • a core script will read that list, make local DB dumps according to it, and then for each such dump it will call the Swift uploader;
  • the Swift uploader will push each file it receives to all the applicable cloud storages it knows about from a CSV-like file;

While keeping database and cloud info in a text CSV-like file might seem like a strange idea at the beginning, I considered that keeping all these info in another database engine can just complicate the things.

Setting up the environment

A dedicated machine running an Ubuntu 18.04 LTS Cloud Image can be used for backups. Settings up the dependencies is quite easy and self-explanatory:

sudo apt-get update
sudo apt-get -y install python-pip python-swiftclient \
python-keystone zip mariadb-client

The Keystone Python client is needed here for authentication to Swift.

You might want to consider securing your server or install additional tools, but this is not covered in this article.

Core backup script

The core backup script will take as an input a file containing a list of databases, together with their environment, hosts, and credentials. This config file will have a syntax similar to a CSV file (with fields separated by white spaces), and comments can start with the character “#”, such as:

# dblist.txt
# DB config file, with mock examples
# ENV SERVER PORT DB USER PASS
pp 10.11.12.13 3306 db1 user1 Password1$
pp 10.11.12.14 3306 db2 user2 Password2$
prod 10.66.77.88 3306 db3 user3 Password3$
prod 10.66.77.89 3306 db4 user4 Password4$

Let’s now analyze the backup script, db_backup.sh. If we have no arguments, we print a friendly usage:

#!/bin/bash
# db_backup.sh
# Print usage if no arguments are given
if [ $# -eq 0 ]; then
echo 'MySQL backup script for multiple database sources.'
echo ' No argument(s) supplied.'
echo ' Usage: db_backup.sh -d <dblist> [ -v ]'
echo ' Options:'
echo ' -d <dblist> Text file that contains the list of databases to be backed-up,'
echo ' having a row format:'
echo ' ENV SERVER PORT DB USER PASS'
echo ' -v Activate verbose (debug) mode.'
echo ''
exit
fi

We set the debug flag to 0 and then we parse the input arguments using getopts:

DBG=0# Get input arguments
while getopts ":d:v" option; do
case ${option} in
d) DB_FILE=${OPTARG}
;;
v) DBG=1
esac
done

We define two functions, log() and verbose(). The first one will always print the supplied argument, adding timestamp as a prefix, while the later will only print it if DBG=1:

# Functions to print logs, verbose or not
log() {
echo "[`date +\"%Y-%m-%d %H:%M:%S\"`]: ${1}"
}
verbose() {
if [ "${DBG}" = 1 ]; then
echo "[`date +\"%Y-%m-%d %H:%M:%S\"`]: ${1}"
fi
}

We then check if an argument containing the db list file was provided and if that file exists and is actually readable:

# Check args
if [ -z "${DB_FILE}" ]; then
log "[ERROR] List of databases not provided. Use -d option."
exit 1
fi
# Check DB list file
if [ ! -r "${DB_FILE}" ]; then
log "[ERROR] List of databases \"${DB_FILE}\" does not exists or is not readable."
exit 11
fi

We create some variables needed internally in the script and then we attempt to create the temporary directory:

BASE_DIR=`dirname "${DB_FILE}"`
TMP_DIR='/tmp/backups'
SUFFIX=`date +"%Y_%m_%d"`
OS_FILE="${BASE_DIR}/oslist.txt"
mkdir -p "${TMP_DIR}"
# Check existence and writeability of dest dir
if [ ! -w "${TMP_DIR}" ] ; then
log '[ERROR] Destination directory ${TMP_DIR} does not exists or is not writeable.'
exit 20
fi

We then start parsing the db list file. At first we grep the db file so it doesn’t start with # (it’s not a comment), we read each line using the parameter -r (so backslashes are not allowed as escapes), and test if that line is not empty. Then we read the columns in variables in the given order (PRE, SRV, PORT, …):

grep -v '^#' "${DB_FILE}" | while read -r LINE || [ -n "$LINE" ]; do
read -r PRE SRV PORT DB USER PASS REST <<< "${LINE}"

We test that our line contained proper variables (not nulls) and nothing else, and print a debug message (if debug is enabled, of course):

if [ ! -z "${PRE}" ] && [ ! -z "${SRV}" ] && [ ! -z "${PORT}" ] \\
&& [ ! -z "${DB}" ] && [ ! -z "${USER}" ] && [ ! -z "${PASS}" ] \\
&& [ -z "${REST}" ]; then

verbose "[DEBUG] Will backup with env prefix \"${PRE}\", from server \"${SRV}\":${PORT}, database \"${DB}\", with user \"${USER}\" and password that was provided..."

Then we make the local db dump using the native MariaDB’s mysqldump ⁵ client:

FNAME="${PRE}_${DB}_${SUFFIX}.sql"
mysqldump -h "${SRV}" -P "${PORT}" -u "${USER}" -p"${PASS}" "${DB}" --compatible=postgresql > "${TMP_DIR}/${FNAME}"
EC=$?

We verify if mysqldump gave us any errors. If it did, we continue the backup with the other databases. We won’t abort the backup process completely, because it might be just a temporary error and we don’t want to skip other backups, that might run just well.

If there are no errors, we compress the file, also catching possible errors, and if everything is OK, we cleanup the uncompressed file:

if [ "$EC" != "0" ]; then
log "[ERROR] MySQLDump error <${EC}> on file \"${FNAME}\". Will continue with the others."
echo '----------'
continue
else
ANAME="${FNAME}.zip"
zip -9 -j -q -P "${ZIP_PASS}" "${TMP_DIR}/${ANAME}" "${TMP_DIR}/${FNAME}"
EC=$?
if [ "$EC" != "0" ]; then
log "[ERROR] Zip error <${EC}> on compressing \"${FNAME}\". Will continue with the others."
echo '----------'
continue
else
rm -f "${TMP_DIR}/${FNAME}"

Then we upload the compressed file to our Swift destination(s), using the script swift_upload_mul.sh, which will be discussed in a later section:

# Multiple upload to Swift containers
VERB=''
if [ "${DBG}" = 1 ]; then
VERB='-v'
fi
"${BASE_DIR}/"swift_upload_mul.sh -e "${PRE}" -f "${TMP_DIR}/${ANAME}" -o "${OS_FILE}" -p "${DB}" "${VERB}"
EC=$?
if [ "$EC" != "0" ]; then
log "[ERROR] Swift error <${EC}> on uploading \"${ANAME}\". Will continue with the others."
echo '----------'
continue
else
rm -f "${TMP_DIR}/${ANAME}"
log "[INFO] Successful backup and upload for database \"${DB}\" from env \"${PRE}\", server \"${SRV}:${PORT}\"."
fi
echo '----------'
fi
fi
fi
done
exit 0

Upload to OpenStack Swift

The upload script, swift_upload_mul.sh, gets as an input argument a file containing a list of environments (which must match the ones defined earlier), container names and OpenStack RC files (needed for authentication and environment setup):

# oslist.txt
# List of available OpenStack containers
# ENV CONTAINER RCFILE
pp app_backup_pp os_openrc.d/pp_cloud01.sh
prod app_backup_prod os_openrc.d/prod_cloud01.sh
pp app_backup_pp os_openrc.d/pp_cloud02.sh
prod app_backup_prod os_openrc.d/prod_cloud02.sh

In the above case, the location of the RC files (os_openrc.d/pp_cloud01.sh) is relative to the location of the oslist.txt file itself. They can also be absolute — the script was designed to handle these cases, as you will see below.

To get the RC files, you must login to OpenStack, and then download the proper files from Access & SecurityAPI AccessDownload OpenStack RC File v3. If desired, you can remove the prompt for password so you will not be asked every time, by replacing the following lines:

export OS_USERNAME="username"
read -sr OS_PASSWORD_INPUT
export OS_PASSWORD=$OS_PASSWORD_INPUT

with these ones:

export OS_USERNAME='username'
export OS_PASSWORD='passwordhere'

Just as in the previous script, swift_upload_mul.sh starts by printing a friendly usage, then getting the input arguments and checking them, defines needed functions and creates internal variables:

#!/bin/bash
# swift_upload_mul.sh
# Print usage if no arguments are given
if [ $# -eq 0 ]; then
echo 'Swift upload script for multiple destinations.'
echo ' No argument(s) supplied.'
echo ' Usage: swift_upload_mul.sh -e <env> -f <file> -o <oslist> -p <path> [ -v ]'
echo ' Options:'
echo ' -e <env> Environment, usually pp or prod.'
echo ' -f <file> File to be uploaded.'
echo ' -o <oslist> Text file that contains the definitions for OpenStack container targets,'
echo ' having a row format:'
echo ' ENV CONTAINER RCFILE'
echo ' -p <path> Optionally upload under this <path> inside the container.'
echo ' -v Activate verbose (debug) mode.'
echo ''
exit
fi
DBG=0# Get input arguments
while getopts ":e:f:o:p:v" option; do
case ${option} in
e) ENV=${OPTARG}
;;
f) FILE=${OPTARG}
;;
o) OS_FILE=${OPTARG}
;;
p) C_PATH=${OPTARG}
;;
v) DBG=1
esac
done
# Functions to print logs, verbose or not
log() {
echo "[`date +\"%Y-%m-%d %H:%M:%S\"`]: ${1}"
}
verbose() {
if [ "${DBG}" = 1 ]; then
echo "[`date +\"%Y-%m-%d %H:%M:%S\"`]: ${1}"
fi
}
# Check args
if [ -z "${ENV}" ]; then
log "[ERROR] Environment not provided. Use -e option."
exit 1
fi
if [ -z "${FILE}" ]; then
log "[ERROR] File to be uploaded not provided. Use -f option."
exit 2
fi
if [ -z "${OS_FILE}" ]; then
log "[ERROR] OpenStack container file not provided. Use -o option."
exit 3
fi
# Check file to upload
if [ ! -r "${FILE}" ]; then
log "[ERROR] File to be uploaded \"${FILE}\" does not exists or is not readable."
exit 10
fi
# Check OS container list file
if [ ! -r "${OS_FILE}" ]; then
log "[ERROR] List of containers \"${OS_FILE}\" does not exists or is not readable."
exit 11
fi
BASE_DIR=`dirname "${OS_FILE}"`

We parse each line checking that it’s not a comment (starts with #) or empty, we verify that it has the proper fields and then we determine that the RC files are readable:

# Read OS container list file
grep -v '^#' "${OS_FILE}" | while read -r LINE || [ -n "$LINE" ]; do
read -r FENV FCONT FRC REST <<< "${LINE}"
if [ ! -z "${FENV}" ] && [ ! -z "${FCONT}" ] && [ ! -z "${FRC}" ] && [ -z "${REST}" ]; then
if [ "${ENV}" = "${FENV}" ]; then
# Determine if RC files have relative or absolute paths (start with / or not)
# If relative, then they will be relative to the OS_FILE
if [[ "${FRC}" != /* ]]; then
ARC="${BASE_DIR}/${FRC}"
else
ARC=${FRC}
fi
verbose "[DEBUG] Found container \"${FCONT}\" on env \"${FENV}\" using rc file \"${FRC}\" -> \"${ARC}\"."if [ ! -r "${ARC}" ]; then
log "[ERROR] OpenStack RC file \"${ARC}\" does not exists or is not readable. Will continue with the others."
continue
else

If everything is OK up to this point, we source the OpenStack RC file needed to access the current tenant:

source "${ARC}"
EC=$?
if [ "$EC" != "0" ]; then
log "[ERROR] Could not source RC file \"${ARC}\". Will continue with the others."
continue

Then we upload the file to Swift storage:

else
OBJ=`basename "${FILE}"`
# Add subpath if requested
if [ ! -z ${C_PATH} ]; then
OBJ="${C_PATH}/${OBJ}"
fi
verbose "[DEBUG] Will now upload file \"${FILE}\" using object name \"${OBJ}\"."
swift -q upload "${FCONT}" "${FILE}" --object-name "${OBJ}"
EC=$?
if [ "$EC" != "0" ]; then
log "[ERROR] Could not upload file \"${OBJ}\" in container \"${FCONT}\", using RC file \"${ARC}\". Will continue with the others."
continue
fi
fi
fi
fi
else
log '[ERROR] Container file list \"${OS_FILE}\" has incorrect format, please check it.'
exit 22
fi
doneexit 0

We can then run the core script, detailed previously:
./db_backup.sh -d /some/path/dblist.txt
— or add it to a crontab.

If running in verbose mode, output will be similar to:

# ./db_backup.sh -v -d dblist.txt
[2018-11-22 09:25:05]: [DEBUG] Will backup with env prefix "pp", from server "10.11.12.13":3306, database "db1", with user "user1" and password that was provided.
[2018-11-22 09:25:07]: [DEBUG] Found container "app_backup_pp" on env "pp" using rc file "os_openrc.d/pp_cloud01.sh" -> "./os_openrc.d/pp_cloud01.sh".
[2018-11-22 09:25:07]: [DEBUG] Will now upload file "/tmp/backups/pp_db1_2018_11_22.sql.zip" using object name "db1/pp_db1_2018_11_22.sql.zip".
[2018-11-22 09:25:11]: [DEBUG] Found container "app_backup_pp" on env "pp" using rc file "os_openrc.d/pp_cloud02.sh" -> "./os_openrc.d/pp_cloud02.sh".
[2018-11-22 09:25:11]: [DEBUG] Will now upload file "/tmp/backups/pp_db1_2018_11_22.sql.zip" using object name "db1/pp_db1_2018_11_22.sql.zip".
[2018-11-22 09:25:15]: [INFO] Successful backup and upload for database "db1" from env "pp", server "10.11.12.13:3306".
----------

Looking at the OpenStack Swift containers, we can see the sub-paths we defined.

Fig 02 — List of sub-paths under the container

Entering one of these sub-paths will display the list of DB backups:

Fig 03 — List of objects (DB backups) under a sub-path

Conclusions

We were able to create a backup solution, complete with storage medium and scripts, that fulfills all our criteria.

The scripts can be easily extended to backup another types of databases, by using their native tools or some other apps, and also, multiple uploaders can be implemented, with different types of cloud targets.

This backup solution already serves multiple applications at this point, without any incident so far.

If you want to try it out, the demo code is available on Github Gist:

https://gist.github.com/bogdane/dda9f630867cd04053c7f2227ed556b9

Also, please note that you need to create the os_openrc.d subdirectory with the proper RC files.

In a future article I will show a configurable cleanup solution for Swift containers, able to perform different actions on objects that are obsolete.

Bibliography

¹: OpenStack Swift Documentation
²: Ceph Storage
³: Ceph on Wikipedia
⁴: How to Integrate Ceph with OpenStack
⁵: MariaDB mysqldump Documentation

--

--