Data Protection technologies comparison

D
4 min readJul 21, 2023

--

There are multiple Data Protection techniques. First of all there is High Availability (HA). That comes in two flavors local-HA or simply “HA” and Metro-HA. Also, there is a Disaster Recovery.

  • DR & HA (or Metro-HA) protecting from different kinds of failures, therefore designed, behave & working quite differently, though both are data protection technologies. You see Metro-HA solution is an HA stretched between two sites (typically up to 700 km), it is not a DR solution. DR can come in a couple of flavors also: Strict Synchronous, Relaxed Synchronous (both ofter referred simply as “Sync” replicas), and Asynchronous (or simply Async).
  • Metro-HA is similar to Strict Synchronous but as HA has RPO = 0 and RTO is nearly zero (and typically designed in case of a failure on one node to be not noticed by the clients).

Async Relocation on another hand is used for Disaster Recovery, not Metro-HA. When you are saying DR, it means you store point in time data (typically snapshots) for cases like data (logical) corruption, so you’ll have the ability to choose between snapshots to restore. Moreover, the ability also meant responsibility, because you or another human must decide which one to select & restore. So, there is no “automatic, out of the box” switchover to DR site with Async replication like Metro-HA. Once you have many snapshots, it means you have many options, which means it is not easy for a program or a system to decide to which one it should switch or fallback. Also, Sync & Async Replications for DR provides many options for backup & restore:

  • DR does not require you to have on the destination recovery site to have the same about of dedicated resources such as CPU and RAM (you should consider increasing CPU & RAM resources on DR to make it equal to the primary site, only once you switched after a disaster to the recovery site).
  • While Metro-HA requires you to have mirroring the same configuration across sites.
  • With HA and Metro-HA you might want to have Fan-Out replicas for DR as these technologies compliment each other, not compete.
  • Synchronous & Metro-HA Replications are much more sensitive to network latency, and generates much more throughput and therefore have limits of physical distance between sites and may have some strict requirements for your network. While Async replicas much less strict for latency, throughput and physical distances.
  • You can have both HA/Metro-HA, local snapshots and DR replication, and you want to have all.

All these options give you much flexibility for async mirror and mean your system must have a very complex logic to switch between sites automatically, long story short, it is impossible to have a single solution which gives you a logic which is going to satisfy every customer, all possible configurations & all the applications in one solution. In other words, with that flexible solution like async replication switchover in many cases done manually.

At the end of the day, an automatic or semi-automatic switchover is possible with DR

At the end of the day automatic or semi-automatic switchover is possible & must be done very carefully with environment knowledge, understanding of precise customer situation and customized for:

  • environments
  • protocols
  • applications.

Metro-HA on another hand can automatically switch over between sites in case of one site failure, but it operates only with the active file system and solves only Data Availability problem, not Data Corruption. It means if your data been (logically) corrupted by let’s say a ransomware virus infection, then Metro-HA switchover not going to help, but Snapshots, Backups & DR will. Unlike DR Mirroring, Metro-HA has strict deterministic environmental requirements, and a couple of sites between which your system can switch plus it works only with the active file system (no snapshots) used, in this deterministic environment it is possible to determine surviving site which is to choose and switch automatically with a tiebreaker. A tiebreaker is a technique which makes the decision for site switchover.

DR Replication and recovery

To increase speed of recovery after a disaster manual steps and scripts must be prepared in advance and even maybe tested from time to time.

Do not mix up Metro-HA (Geo-distributed and opposed to local HA) with DR; those are two separate and not mutually exclusive data protection technologies: you can have both Metro-HA & DR, so big companies usually have both Metro-HA & DR replication because they have budgets, business requirements & approval for that.

The solution

On the DR site after a disaster event on primary site occurs, set up & configure your own script with logic suitable for your environment, protocols and apps which switches between sites automatically or semi-automatically. Test it time to time to make sure you are prepared for a disaster.

See also:

Kinds of Data Protection
Data in the cloud are not invincible

--

--