reCap: AWS Multi-Region Architecture
Many AWS services have features to help you build and manage a multi-Region architecture, but identifying those capabilities across 200+ services can be overwhelming.
Many Well-Architected workloads only span either across Availability zones(AZs) or Regions, limited by it’s core offerings, however in order to achieve greater fault tolerance & availability goals, these workloads are to extending across AWS regions in a seamless manner. Lets recap what are the typical use cases that drives multi-region architectures.
Use cases
- Expansion to a global audience as an application grows and its user base becomes more geographically dispersed, there can be a need to reduce latencies for different parts of the world.
- Reducing Recovery Point Objectives (RPO) and Recovery Time Objectives (RTO) as part of a multi-Region disaster recovery (DR) plan.
- Local laws and regulations may have strict data residency and privacy requirements that must be followed.
- Achieving the
five nine
availability - Location affinity (anti-pattern here) is required given tight coupling with other ERP or on-premises systems.
What are the services in AWS are distributed, and to what level ?
- Route53, IAM, CloudFront, WAF are global
- Subnets, EC2 Instances, and EBS Volumes are Availability Zone (AZ) based deployments.
- Rest all scoped to respective region, like VPCs, RDS..etc
Hence it’s critical to understand the capabilities that need to be brought in, to achieve globally distributed architecture, per service basis. Below diagram depicts some at high level, let’s recap one after other.
Compute and Storage tier
EC2 and Container workload clusters are to be deployed per Availability Zones (AZs) and with CloudFormation StackSet & GitOps tools you can deploy from single source of truth, across multiple regions. These compute clusters might be operating in a multi-master or single-master based on underlaying deployment architecture, and clusters in multiple AWS regions are connected over private network, as reviewed in my other recap series on global network.
For underlaying storage or volume,
- EBS can’t be auto replicated to another region. EBS volumes are snapshotted, encrypted and copied across region. You might consider sharing a KMS a key.
AWS KMS supports multi-Region keys, which are AWS KMS keys in different AWS Regions that can be used interchangeably — as though you had the same key in multiple Regions.
- EFS block-based replication can be used to create a read-only EFS volume in another Region.
- AWS Secrets Manager replicate secrets across multiple AWS Regions, and keep the replicas in sync with the primary secret.
- AMIs are copied as required.
Refer to AWS Backup tool for storage copy and replication, some of its capabilities are mentioned in Appendix section.
For Amazon Simple Storage Service (Amazon S3), all of its storage classes except One Zone-IA, the objects are distributed across AZs by default.
You can use Amazon S3 Cross-Region Replication (CRR) to asynchronously copy objects to an S3 bucket in the DR region continuously. Amazon S3 within a region offers strong read-after-write consistency; however, replicated objects will be eventually consistent in destination regions. S3 replication rules, supports one-way or two-way continuous replication. Versioning is to be enabled on the both buckets for CRR to work.
Another feature is, Amazon S3 Multi-Region Access Points creates a single global endpoint for S3 objects. These requests are routed through AWS Global Accelerator to the respective regions bucket with the lowest latency.
Routing or Networking Tier
AWS Transit Gateway comes handy for VPCs connectivity. It overcomes VPC peering difficulties [i.e., no (n*n-1)/2 complexity , and no transitive limitation] by creating a network transit hub which connects your VPCs with in a region and/or On-premises networks.
A Transit Gateway can expand to additional Regions with Transit Gateway inter-Region peering to create a globally distributed, private network for your resources.
Route 53 is a global service which includes routing policies to route traffic to respective region. For example, you can route a request to a record with the lowest network latency, weighted, failover or to a specific geolocation that has localized application endpoint. Amazon Route 53 health checks monitor these endpoints. Endpoints can be from any region.
Route 53 Resolver is a global service, helps to resolving DNS queries between VPCs between regions and as well from On-prem. Hierarchy is defined using forwarding rules, with below targets,
- When you create a VPC, the Route 53 Resolver that is created by default, maps to DNS server listening on 2nd IP of VPC’s CIDR (primary)range.
- Plus if any Peered VPCs
- Resolver additionally contains endpoints that you configure to answer DNS queries from On-prem environment or public DNS.
Route 53 Application Recovery Controller (Route 53 ARC) offers a comprehensive failover solution. Route 53 ARC routing policies, safety checks, and readiness checks help you to failover across Regions, AZs, and On-premises reliably.
Route53 directs DNS requests to CloudFront distribution when used.
The Amazon CloudFront’s content delivery network is global, hence nothing much to talk about other than highlighting its features,
- Use Cached response [origin server in that region if its a 1st request]
- Use CloudFront origin failover to automatically fail over to a secondary origin when the primary is not available, per request basis.
- Configure custom error pages or generate redirects with Lambda@Edge if your origin is unavailable.
AWS Global Accelerator (AWS Edge) provides two static AnyCast IP, you can associate multiple endpoints in one or more AWS Regions with the same static public IP address or addresses.
Global Accelerator health checks monitor endpoints. You can seamlessly add or remove origins while continuing to automatically redirect traffic to a healthy endpoint within seconds, with no changes to the static IP. Global Accelerator also avoids caching issues that can occur with DNS systems (like Route 53).
firewall rules to allow inbound traffic from the IP addresses associated with Amazon Route 53 health checkers to complete health checks for EC2 instance or Elastic IP address endpoints. [even for Global Accelerator, which means healths are reused, isn’t it. End point for ELBs uses it’s own health checks data]
Data Tier
DynamoDB stores tables across multiple availability zones by default.
Amazon DynamoDB global tables provide multi-Region and multi-writer capabilities
- Application writes are always “local” and synchronous. Replication to another region is asynchronous. So write latency is normal ~ few milliseconds and replicate latency ~ 0.5 to 1.5 second.
- If applications update the same item in different Regions at about the same time, conflicts can arise. To help ensure eventual consistency, DynamoDB global tables use a last-writer-wins reconciliation and makes a best effort to determine the last writer.
Amazon Aurora global database provides for scaling of database reads across Regions in Aurora. Designing right data consistency (read local and write global) strategy for writes can achieve global distributed DB goals.
- Replicate to up to five secondary Regions with typical latency of under a second. These are read only replicas.
- Writes occur only to the primary Region. Promoting one of the secondary Regions to take read/write responsibilities in less than one minute.
- Aurora MySQL supports write forwarding
- Utilizing Aurora global database managed planned failover for DR.
While Amazon Redshift doesn’t have cross-Region replication features, however can set up Amazon Redshift to automatically replicate snapshots of your data warehouse to another Region
Others,
- Amazon RDS read replicas
- Amazon DocumentDB global clusters
- Global Datastore for Amazon ElastiCache for Redis
Scenarios where primary read and write cluster in one region and having secondary in another region as a read-only, you can use Application Load Balancer to front end, by routing write-only traffic (POST, PUT, DELETE ..etc) to primary cluster’s VPC endpoint and read-only traffic to both regions. With this you are achieving,
- Global reads with local latency
- No impact on write latency as it always goes to Primary region.
- In the event of a regional failover, you must manually promote (except in RDS) a secondary.
Bear in mind that changes on primary cluster like restart or lost in connectivity or failover, impact on secondary cluster availability status (not always active) based on storage engine or database used. Refer to respective DB, version on exacts.
Operational Tier
Managing Windows users, devices, and applications on a multi-Region network, you can set up AWS Directory Service for Microsoft Active Directory Enterprise Edition to automatically replicate directory data across Regions.
- Microsoft AD and does not require you to synchronize or replicate data from your existing Active Directory to the cloud.
Applications that need to securely store, rotate, and audit secrets, such as database passwords, should use AWS Secrets Manager. This service encrypts secrets with AWS Key Management Service (AWS KMS) keys and can replicate secrets to secondary Regions to ensure applications are able to quickly retrieve a secret in the closest Region.
- set up AWS KMS multi-Region keys
CloudTrail logs should be aggregated into a single Amazon S3 bucket for easier analysis. As a best practice, when you create a trail (a custom trail), apply to all AWS Regions, CloudTrail uses the trail that you create in a particular Region to create trails with identical configurations in all other Regions in your account.
Turning on a trail means that you create a trail and start delivery of CloudTrail event log files to an Amazon S3 bucket.
Choose right AWS partitions for your account
Ensure that service quotas in your DR Region are set high enough so as to not limit you from scaling up to production capacity.
Appendix
AWS Backup provides a centralized location to configure, schedule, and monitor AWS backup capabilities for the following services and resources
- Amazon Elastic Block Store (Amazon EBS) volumes
- Amazon EC2 instances
- Amazon Relational Database Service (Amazon RDS) databases (including Amazon Aurora databases)
- Amazon DynamoDB tables
- Amazon Elastic File System (Amazon EFS) file systems
- AWS Storage Gateway volumes
- Amazon FSx for Windows File Server and Amazon FSx for Lustre
AWS Backup supports copying backups across Regions, such as to a disaster recovery Region.
Recap of RRO & RTO objectives,
For files stored outside of Amazon S3 and Amazon EFS, you can use AWS DataSync which simplifies, automates, and accelerates the moving of data across Regions and accounts. DataSync can be used to sync on-premises files stored on NFS, SMB, HDFS to AWS.
Further References
from re:Invent sessions,
DNS Endpoint hell: DNS resolution hierarchy with end points — https://aws.amazon.com/blogs/architecture/using-route-53-private-hosted-zones-for-cross-account-multi-region-architectures/
Mere Mortals: Good pictorial explanation of system availability calculations.