SQL Server Backup Concepts in AWS (SQL Server backup part 1 of 3)

Clumio
7 min readJul 10, 2023

--

Welcome to part one of a three-part blog series on backing up SQL Server databases running on AWS. This series will cover SQL Server backup concepts, implementation options and methods in depth. Part one focuses on foundational concepts of SQL Server backup that should be understood before diving into implementation options and methods. If you find this content helpful, you can download the complete ebook.

Formulating a robust backup strategy for SQL Server on AWS relies on grasping core concepts like the VSS agent, snapshots, transaction logs (T-logs), RPO, RTO, and point-in-time recovery (PITR). This chapter distills these concepts, key to selecting backup strategies and tools such as AWS’s built-in solutions or third-party offerings like Clumio. With this foundation, you can craft a backup strategy that optimizes resources, mitigates data loss risk, and ensures application resilience on AWS.

Full, differential, and incremental backups

In SQL Server, the first backup is always a full backup — a complete backup of all your data and any additional components required to restore the database. It serves as the base for both differential and incremental backups and typically takes the most time and storage space. Differential backups contain all the changes made since the last full backup. The Differential Change Map, or DCM, keeps track of these changes. While these backups typically take less time and storage space than full backups, they may become larger over time as they always include all changes since the last full backup. This can result in duplicate data across multiple differential backups.

While not natively supported by SQL Server, incremental backups involve backing up only the changes since the last successful backup. This results in small and fast change capture. For SQL Server instances running in AWS, incremental backups can be implemented in the form of incremental snapshots, as explained in the section below.

Snapshots

In the realm of data storage, backups, and disaster recovery, ‘snapshots’ form a key concept. A snapshot is essentially a copy of the state of a storage volume at a particular point in time. It serves as a detailed table of contents, pointing to where data blocks are stored at that specific time. Snapshots occupy disk space, which is why most database backup strategies employ taking snapshots on a daily, weekly, or monthly basis and retaining them for several weeks or months. In general, AWS snapshots are incremental in nature, that is, after a first full snapshot, changes are captured incrementally to save on storage space, and ‘rewound’ during a restore.

Snapshots are an effective component of a backup strategy for SQL Server databases because they enable you to quickly save the state of your database at any given moment, and then return to that state if needed. This is especially useful in scenarios where a faulty update or a cyber-attack causes data loss or corruption.

On AWS, snapshots can be created for Amazon RDS instances, Amazon EBS volumes attached to Amazon EC2 instances, or the VMware cluster in which the SQL servers are running. Creating a snapshot of an Amazon RDS instance or an Amazon EBS volume results in a copy of the data at that point in time, allowing the user to return to that point if required. This may tempt you to think that snapshots can also be used for point in time recovery, but that is not the case. Point in time recovery (covered in more detail below) lets you recover to any point in time, down to the minute or second. Snapshots, however, let you return to only that point when the snapshot was taken.

Transaction logs

In the context of SQL Server, transaction logs (T-logs) are vital for maintaining data integrity and enabling point-in-time recovery. Every database has a transaction log that records all modifications to the database. In addition to user transactions, it records the details of modifications made by system procedures and functions to ensure data integrity.

T-logs contain a sequential record of all actions that have occurred in the database. When a user or application issues an INSERT, DELETE, UPDATE, or any other SQL command that modifies data, the change is recorded in the log before it is committed to the database. This ensures that no data modifications are lost in the event of a system failure.

Regular backups of the transaction log enable point-in-time recovery for the database, down to the exact second, rather than periodic intervals as with snapshots. This can be especially helpful in scenarios where data corruption or loss has occurred in between snapshots and you need to revert to a state right before the incident. SQL Server can use the transaction log to recover the database by rolling forward committed transactions and rolling back incomplete transactions.

In an AWS environment, Amazon RDS for SQL Server automatically manages the transaction log backups if you’ve enabled automated backups. However, for SQL Server on Amazon EC2 or VMC on AWS instances, the DBA or IT team will need to handle transaction log backups separately, or automate it with Clumio.

Understanding the role of transaction logs is essential for a comprehensive SQL Server backup strategy that leaves no gaps.

Recovery Point Objective and Recovery Time Objective

In the realm of data backup and recovery, perhaps the most important metrics are Recovery Point Objective (RPO) and Recovery Time Objective (RTO). They describe the tolerance of an application, SLA, or business towards data loss (RPO) and downtime (RTO).

RPO describes the maximum amount of data loss a business can tolerate in the event of a disaster. RPO is measured in time and refers to the age of the last backup that will be used to recover the data. For example, if your RPO is 15 minutes, then in the event of a disruption, you should be prepared to lose no more than the last 15 minutes’ worth of data.

RTO, on the other hand, is the duration within which systems and applications must be restored after an outage to avoid unacceptable consequences associated with a break in business continuity.

In the context of SQL Server on AWS, understanding and setting the appropriate RPO and RTO is crucial. For instance, if you’re running SQL Server on Amazon RDS, AWS provides the option for automated backups that allow for point-in-time recovery to within seconds. However, the RTO can vary based on the size of the database and the nature of the incident. In contrast, for SQL Server on Amazon EC2 or VMC on AWS instances, the RTO and RPO will depend on how you’ve configured your backups, and the capabilities of the tools you are using. In most cases, Clumio delivers near-0 RPO and RTO, in addition to capabilities like SQL-queried record retrieval. More on that later.

Point in time recovery

As discussed in the section on transaction logs, Point-In-Time Recovery (PITR) refers to the recovery of data to a specific point in time. In other words, it is the process of restoring the data from a backup to a specific desired moment. PITR is essential in busy databases when data corruption or loss has occurred and there’s a need to revert the data to its state to an exact point in time to debug or recover from the underlying issue.

When a database is restored using a full database backup (usually a snapshot) and subsequent transaction log backups, a SQL Server instance can be restored to a specific point in time. The combination of these two provides a much more complete strategy than using snapshots alone.

In AWS, Amazon RDS supports PITR for SQL Server databases. If automated backups are enabled, you can recover your database to any point in time within the backup retention period — typically between 1 and 35 days. However, for SQL Server on Amazon EC2 instances, PITR would need to be handled manually by the DBA or IT team, which includes restoring the full database backup and then applying the transaction log backups in the correct sequence up to the desired point in time. Clumio can automate this process for SQL Server on Amazon EC2, SQL Server on Amazon RDS as well as SQL Server on VMC on AWS.

By understanding the importance of PITR and the mechanisms involved, you can better protect your SQL Server data and minimize the impact of data corruption or loss.

Volume Shadow Copy Service

Microsoft Volume Shadow Copy Service, commonly known as VSS, is a crucial component of a comprehensive backup strategy for SQL Server on AWS.

VSS is a set of Component Object Model (COM) APIs that provide a framework to enable storage volume backups to be performed while applications on a system continue to write to the volumes. VSS provides a mechanism for creating ‘shadow copies’ — consistent point-in-time copies of data in which all transactions are complete and no data is left in an intermediate state.

Since SQL Server is a live, transactional system, creating backups during ongoing operations can be challenging. VSS helps manage this complexity by working with SQL Server to quiesce, or temporarily pause, database operations, ensuring that all database transactions are at a known, consistent state before the backup is taken.

VSS’s role is fundamental regardless of your SQL Server deployment option. Whether you’re running SQL Server on Amazon RDS, on Amazon EC2 instances, or on VMC on AWS, understanding how VSS works aids in formulating a backup strategy that ensures minimal data loss in the event of an operational disruption.

The next blog in this series will delve into the backup strategies for SQL Server on AWS.

For a more in-depth experience, download the complete ebook including diagrams & content not included here.

--

--