Backup of a large-scale, distributed system on AWS

Sanket Bengali
4 min readJun 1, 2019

--

Image by rawpixel from Pixabay

A large-scale, distributed system, deployed on a public Cloud may include multiple services (and resources).

On AWS, for ex., EC2 + EBS (for VMs and block storage), RDS (for Database), EFS (for File System), Elasticsearch (for Log Analysis) and so on.

Each of these services have several options for Backup as listed here : AWS services (EBS, RDS, EFS, Elasticsearch) backup solutions

While AWS has launched dedicated Backup service that allows to centrally manage Backup of multiple services, it still requires some scripting for Backup of multiple AWS services with a “custom flow”.

The Backup workflow may require scripting in below scenarios :

  1. Backup flow may need to execute pre-scripts (before the backups start) and post-scripts (after the backups completed).
  • Examples of pre-script and post-scripts could be freeze/thaw a node, stop/start some service on a node etc.
  • This could also include executing application specific scripts/commands on EC2 instances.

2. Several applications backup (for ex. Neo4j database backup) need to be done using specific commands/scripts execution on particular EC2 instances.

There can be several options to implement the Backup workflow :

  1. AWS Systems Manager
  • Pros : It allows to schedule execution of scripts (Python, Shell) on EC2 instances stored on S3 or GitHub. Also, supports several common maintenance tasks like stopping/starting an EC2 instance, creating snapshot etc.
  • Cons : It is tied to EC2 instances, so requires lot of scripting to take backup of other AWS services. Also, managing a complex Backup workflow would be difficult.

2. AWS Step functions and Lambda

  • Pros : Any complex workflow (with the support of conditions, loops, wait, parallelization etc.) can be implemented by executing several Lambda functions.
  • Cons : It includes additional costs of Step functions and Lambda services. Also, it is tied to AWS.

3. Ansible playbooks (explained below)

  • Pros : Easy to develop and gives full flexibility to implement complex workflows (with support for parallel execution, branches, loops etc) with supported modules. Playbooks execution can be scheduled easily from AWS using Systems Manager or any external tool. Also, they can be an easy option to extend and maintain for multi-clouds including Azure, GCP.

Below is a sample Backup workflow (implemented using Ansible playbooks) :

Sample Backup workflow for multiple AWS services

Here is the GitHub link to download sample Ansible playbook solution :

This Ansible playbook takes AWS RDS, EFS (using AWS Backup), EC2 snapshots and application (ex. Neo4j DB on Ubuntu EC2 instance) backup in parallel.

The main playbook executes other playbooks (roles) in below sequence :

  1. Copy SSH key from S3 bucket to Bastian host (using AWS S3 module)
  2. Run pre-script (ex. stop “httpd” service)
  3. Execute 4 backups in parallel (EC2 using EC2 module, RDS using RDS module, EFS using AWS CLI, Neo4j using Neo4j CLI)
  4. Run post-script (ex. start “httpd” service)

AWS backup screenshots

Note the “start time” of all the backups. All are executed simultaneously, nearly at the same time.

EC2 snapshot
RDS snapshot
EFS snapshot (using AWS Backup)
Neo4j DB backup (using Neo4j CLI)

Below concepts are used in the Ansible playbooks :

  1. Executing multiple playbooks (roles) in parallel on different hosts using “strategy: free”, “include_role” and “when” options.
  2. Executing Linux command (for ex. stopping and starting httpd service) with “sudo” user
  3. Using Ansible AWS modules like S3, EC2, RDS etc.
  4. Using AWS CLI like “aws backup” to start a job, and wait for the job to be completed (using “until”, “delay” and “retries”).
  5. Executing Neo4j DB backup command “neo4j-admin backup” with “sudo” user

Working with Ansible — Tips and Tricks with examples : https://spacelift.io/blog/ansible-playbooks

--

--

Sanket Bengali

Passionate about Automation, Orchestration and Systems Integration across industry verticals