SQL Server AlwaysOn Availability Group Cookbook

A “short” guide for ops engineers who are not database administrators but are working on migration MS SQL servers to AWS and Azure. TL;DR

Published in

DevOops World … and the Universe

8 min readFeb 2, 2017

Foreword: Do not reinvent the wheel!

I’m serious! Do not reinvent something that already works and works very well. I’m relatively new to the world of MS SQL Server DBA but even I was able to google Brent Ozar. Of course, he is referencing to points that you should do wherever you need to achieve your goals and ignore even “best practices” but, believe me, in 90% of projects it’s not your case. So, please, use his guides till the moment when you understand that you and your product is special (interpretation for TPM, it means “never”)

Problem

Some engineers (DevOops or who already moved from mom basement) use to reference to “best practices” that blindly so when it is the time to move further they aren’t able to. Transition from MS SQL Server 2008 Cluster clustering to Use Availability Groups in SQL Server 2012 is a very good example of this problem. In short, those solutions can be used in the same cluster without any temptations.

Even though you’re enabling the Failover Cluster feature, you are NOT required to have shared storage to use Availability Groups. You have the option to use a Failover Cluster in an Availability Group, but you can also run your Availability Groups with entirely independent storage subsystems if you desire. The feature is required because no matter what, Availability Groups will use parts of the Failover Clustering feature to manage a virtual network name and IP Address.

Graphical representation for a TPM is:

As a result, theoretically you can use shared disk approach for new SQL Servers 2012/2014/2016 with Standard Edition subscription and not pay extra for Enterprise Edition that is required for AlwaysOn Availability Group.

Win-Win! Actually, it is not so. From DBA perspective Standard Editions lack some nice features that are helpful for them (MS SQL 2016 version can be exception). And IMHO, even Microsoft supports this version because “The First Hit’s Free, Baby”.

From DevOops perspective everything become very crappy because AWS and Azure don’t support shared disks. Bummer! So you are faced with options to use Availability Group or something like Data Kepeer. People are using second option because there are “fear, doubts and uncertainty” and other nice habits that stop human kind to evolve.

Spoiler! We are not discussing AWS RDS here because of some limitations related to HIPAA and other compliances.

We had seen that problem so often so we even created a list of common issues related to SQL cluster migration to AWS. Still, don’t listen to me, use “expert opinion”.

Technical introduction

I will skip long introduction into Availability Group because there are much better documentations and introduction. Second reason, as a DevOops engineer I’m to bored to justify “DevOps Culture” and pretend that I’m “full stack developer”.

Still, I should provide the basis that can be skipped as “a boring part from official documentation” till next comics picture, the so called diagram.

Always On Availability Groups maximises the availability of a set of user databases for an enterprise. An availability group supports a failover environment for a discrete set of user databases, known as availability databases, that fail over together.

Main points:

An availability group supports a set of read-write primary databases and one to eight sets of corresponding secondary databases. Optionally, secondary databases can be made available for read-only access and/or some backup operations.
Supports an availability group listener for each availability group. An availability group listener is a server name to which clients can connect in order to access a database in a primary or secondary replica of an Always On availability group. Availability group listeners direct incoming connections to the primary replica or to a read-only secondary replica. The listener provides fast application failover after an availability group fails over. Spoiler alert! The listener is not a load balancer.
On a given computer, you can enable as many server instances for Always On Availability Groups as your SQL Server installation supports.
The primary replica makes the primary databases available for read-write connections from clients. Also, in a process known as data synchronisation, which occurs at the database level. The primary replica sends transaction log records of each primary database to every secondary database. Every secondary replica caches the transaction log records (hardens the log) and then applies them to its corresponding secondary database. Data synchronisation occurs between the primary database and each connected secondary database, independently of the other databases. Therefore, a secondary database can be suspended or fail without affecting other secondary databases, and a primary database can be suspended or fail without affecting other primary databases.
Each availability group defines a set of two or more failover partners known as availability replicas. Availability replicas are components of the availability group. Each availability replica hosts a copy of the availability databases in the availability group. For a given availability group, the availability replicas must be hosted by separate instances of SQL Server residing on different nodes of a WSFC cluster. Each of these server instances must be enabled for AlwaysOn.
Always On Availability Groups supports two availability modes — asynchronous-commit mode and synchronous-commit mode.

Limitations

An availability group fails over at the level of an availability replica.
Fail-overs are not caused by database issues such as a database becoming suspect due to a loss of a data file, deletion of a database, or corruption of a transaction log.
Each server instance requires a database mirroring endpoint.
Note! that this endpoint is shared by all the availability replicas and database mirroring partners and witnesses on the server instance.
All the server instances that host availability replicas for an availability group must use the same SQL Server collation.
If any databases that use FILESTREAM will be added to an availability group, ensure that FILESTREAM is enabled on every server instance that will host an availability replica for the availability group.
If any contained databases will be added to an availability group, ensure that the contained database authentication server option is set to 1 on every server instance that will host an availability replica for the availability group.

This technical part is short and comprehensive enough to be able to be discussed on a meeting where a lot of “typical managers” will convince you that “We really need it. If we don’t we can’t make the customer happy. Wouldn’t it be easy if we just did it like that? Can you try it real fast?”. Here is a visual representation for them.

This basic picture interesting as core or reference for few next recipes.

Recipe “Business Continuity and Disaster Recovery”

Works with: SQL Server 2012/2014/2016 Enterprise and AlwaysOn Availability Group

For those who are not aware of differences between “fault tolerance” and “high availability” characteristic of system this solution will say nothing. For people who deal with SLA and disaster recovery will appreciate it. As being previously mentioned in the boring technical part, that you probably skipped, Always On Availability Groups supports two availability modes — asynchronous-commit mode and synchronous-commit mode. The difference in how data is being synchronised between cluster nodes create some limitations.

Synchronous-commit mode emphasises high availability over performance, at the cost of increased transaction latency

Where

Asynchronous-commit mode is a disaster-recovery solution that works well when the availability replicas are distributed over considerable distances.

Here is some caveat of it. Despite the fact that an Availability Group supports one to eight sets of corresponding secondary databases in reality only two of them can work in Synchronous-commit mode. All others will be forced to work in Asynchronous-commit mode. This can be a bad thing if you are obsessed with “cool features” but it is not critical if you aren’t a salesman from some “NoSQL” database vendor.

In reality you can have up to six cluster nodes in Asynchronous-commit mode which allows us to build a pretty nice warm or hot site for disaster recovery.

The same approach works for Azure

Recipe “Multi tenant environment”

Works with: SQL Server 2012/2014/2016 Enterprise and AlwaysOn Availability Group

Sometimes the same cluster can be used by many consumers. These can be separated databases per applications with different load or simple multi tenant environment. In short, a cluster should handle many “virtual clusters” or even minor databases. The pitfall here is that the given instance can host only one availability replica per availability group. However, each instance can be used for many availability groups.

As a result you can easily build such weird looking but working solution with many AlwaysON Availability Group across the same cluster nodes.

Recipe “Let’s replace old failover cluster”

Works with: SQL Server 2016 Standard with Basic Availability Group

At last, a recipe for those who are not interested in “fancy features” of the Enterprise Edition and have no money to spend on licences but need a decent fault tolerant MS SQL server at AWS/Azure.

SQL server 2016 Standard with basic AG for failover only. The master instance will operate all the data and the slave instance will be in stand-by mode until a failover situation occurs.

A Basic Availability Group supports a failover environment for a single database. It is created and managed much like the traditional (advanced) Always On Availability Groups (SQL Server) with Enterprise Edition. Basic availability groups enable a primary database to maintain a single replica. This replica can use either the synchronous-commit mode or the asynchronous-commit mode. The secondary replica remains inactive unless there is a need to failover. This failover reverses the primary and secondary role assignments, causing the secondary replica to become the primary active database.

This recipe is short in description but very long in limitations

Limit of two replicas (primary and secondary).
No read access on secondary replica.
No backups on secondary replica.
No support for replicas hosted on servers running a version of SQL Server prior to SQL Server 2016 Community Technology Preview 3 (CTP3).
No support for adding or removing a replica to an existing basic availability group.
Support for one availability database.
Basic availability groups cannot be upgraded to advanced availability groups. The group must be dropped and re-added to a group that contains servers running only SQL Server 2016 Enterprise Edition.
Basic availability groups are only supported for Standard Edition servers.

Summary

I’m not even trying to behave as the expert in MS SQL clusters here. I assembled this guide to avoid long explanation to those people who are too busy with their production environments but still have time to have long meetings with many “why” and “need”. Probably it will save your time and money too.