ElastiCache Redis Advanced Feature

ElastiCache Redis 提供多種功能,不同於自建Redis Cluster,例如Global Datastore提供跨Redis cluster 跨區域(Cross-Region)自動同步,來提供DR(Disaster Recover)的解決方案、MemoryDB使用 持久地存儲在分布式多可用區(Multi-AZ)的事務日誌中,來達到數據保存性的目的,可以避免主節點因節點故障,而造成從節點缺少部份來不及同步(Replication)的數據、AutoScaling ElastiCache for Redis clusters 提供使用者自訂條件來自動執行 Scaling 來符合業務需求、Data Tiering 提供使用者具有成本效益的方案,他會自動搬移數據到本機 local SSD disk (Instance Store)、Log Delivery 可以持續串流 slowlog 及 Engine logs 到 Amazon CloudWatch Logs 或Amazon Kinesis Data Firehose,讓使用者可以進一步去分析特定時間點,是否有異常。

Jerry’s Notes
What’s next?
7 min readMar 26, 2022

--

Global Datastore

Q: What is Global Datastore for Redis?

Global Datastore in Amazon ElastiCache for Redis provides fully managed, fast, reliable and secure cross-region replication. With Global Datastore, you can write to your ElastiCache for Redis cluster in one region and have the data available to be read from two other cross-region replica clusters, thereby enabling low-latency reads and disaster recovery across regions. Designed for real-time applications with a global footprint, Global Datastore for Redis supports cross-region replication latency of typically under 1 second, increasing the responsiveness of your applications by providing geo-local reads closer to end users. In the unlikely event of regional degradation, one of the healthy cross-region replica clusters can be promoted to become the primary cluster with full read/write capabilities. Once initiated, the promotion typically completes in less than 1 minute, allowing your applications to remain available. To secure cross-region data transfer traffic, Global Datastore uses encryption in-transit.

使用重點(Key points):

■ Primary (active) cluster — A primary cluster accepts writes that are replicated to all clusters within the global datastore. A primary cluster also accepts read requests. 要叢集接受複寫到全域資料存放區內所有叢集的寫入。主要叢集也接受讀取要求。

■ Secondary (passive) cluster — A secondary cluster only accepts read requests and replicates data updates from a primary cluster. A secondary cluster needs to be in a different AWS Region than the primary cluster. 次要叢集只接受讀取要求,並從主要叢集複寫資料更新。次要叢集和主要叢集必須位於不同的 AWS 區域。(目前當不支援同一個 region,且最多1+2 Region,若要超2個以上的Secondary Redis Cluster 是可以透過 AWS Support提出申請的)。

■ Secondary Redis Cluster: 可以拿來”讀取”數據(Read Only),但”寫入”只能在Primary Redis Cluster的主節點上。

■ Available version and instance type: Redis 5.0.6+ and M5 和 R5 節點上受到支援. Redis 引擎 5.0.6 或更高版本,以及 R5 或 M5 節點類型或或更高版本。

■ Global Datastore support encryption at rest, encryption in transit, and Redis AUTH.

In-Transit: 主要叢集跟次要叢集 AUTH 會一致。

At-rest: 支援!!

■ Global Datastore support AWS KMS customer master keys.

■ Primary cluster 主要叢集才能寫入或是讀取!! 主要叢集和次要叢集必須位於不同的 AWS 區域。

■ Secondary Cluster 次要叢集只能讀取!! 次要叢集最多只能有2個,超過可以透過 AWS support來申請。

■ Security for cross-Region communication is provided through VPC peering.

■ Cluster failover: 客戶可以手動執行。 (NO AutoFailover cross region,這邊的重點是 AutoFailover 只的發生在同一組 Redis Cluster中,若客戶想要切換 Primary Redis Cluster 的角色,就必需要客戶手動自行決定及執行要切換到那一組 Secondary Cluster) 。

■ Shards 的數量: Secondary Cluster 會是一樣的,節點類型及大小也會是一樣的。

The same number of master nodes, node type, engine version, and number of shards (in case of cluster-mode enabled), different number of read replicas. 只有只讀副本數量,可以不相同,其他都要一樣(節點類型、大小、分片組數量)。

New CloudWatch Metric: GlobalDatastoreReplicationLag (這個是指跨 Redis Cluster 之間的同步延遲,簡單來說,就是 Primary Cluster 上的主節點,跟 Secondary Cluster 上的主節點,之間數據同步延遲的狀況)。

Q: Does Amazon ElastiCache automatically failover a Global Datastore for Redis to promote a secondary cluster in the event when primary cluster (region) is degraded?

No, Amazon ElastiCache doesn’t automatically promote a secondary cluster in the event when primary cluster (region) is degraded. You can manually initiate the failover by promoting a secondary cluster to become a primary. The failover and promotion of secondary cluster typically completes in less than one minute. 您可以透過將次要叢集升級為主叢集來手動啟動容錯移轉。容錯移轉和升級次要叢集通常可以在一分鐘內完成。(可以手動將特定 secondary cluster promote 成 primary cluster)

Q: How is my data secured when using Global Datastore for Redis?

Global Datastore for Redis uses encryption in-transit for cross-region traffic to keep your data secure. Additionally, you can also encrypt your primary and secondary clusters using encryption at-rest to keep your end-to-end data secure. Each primary and secondary cluster can have a separate customer managed Customer Master Key (CMK) in AWS Key Management Service (KMS) for encryption at rest.

MemoryDB

Q: What is Amazon MemoryDB for Redis?

Amazon MemoryDB for Redis is a Redis-compatible, durable, in-memory database service that delivers ultra-fast performance. MemoryDB enables you to achieve microsecond read latency, single-digit millisecond write latency, high throughput, and Multi-AZ durability for modern applications, like those built with microservices architectures. These applications require low latency, high scalability, and use Redis’ flexible data structures and APIs to make development agile and easy. MemoryDB stores your entire dataset in memory and leverages a distributed transactional log to provide both in-memory speed and data durability, consistency, and recoverability. You can use MemoryDB as a fully managed, primary database, enabling you to build high-performance applications without having to separately manage a cache, durable database, or the required underlying infrastructure.

使用重點(Key points):

■ 在 Elasticache redis 中,主節點(primary)與只讀節點(replica)為非同步模式,因此在進行 failover 切換時,可能有一部份的數據因為尚未同步而丟失。而在 MemoryDB 中寫操作在返回到客戶端之前,被持久地存儲在分布式多可用區(Multi-AZ)的事務日誌中,主節點上的讀操作總是返回反映,先前所有寫操作的最新數據,在主節點故障轉移之間保留了強一致性,只讀節點也能達成最終一致性。

■ 但 MemoryDB 上因為寫操作在返回客戶端前會同步至事務日誌中,因此可能會造成寫入的延遲略為升高,目前MemoryDB提供微秒(microsecond)的讀延遲,以及個位數的毫秒(millisecond)寫延遲。

■ 一般來說 Redis 的”讀取”效能,都是比”寫入”的效能差,因為”寫入”的行為涉及的內存操作為行,比”讀取”的多。

■ MemoryDB uses AWS Journal service to provide customer with data durability guarantees.

■ The leader/master would replicate the Redis replication commands to the journal service, from where the replicas would read the journal and consume the replication stream.

■ MemoryDB maintains compatibility with open-source Redis

■ Redis version 6.2.4+ 目前有支援版本的限制。

■ Only cluster mode enabled 。目前只支援 Cluster Mode Enabled,所以客戶端也必需支援才行。

■ Only db.r6g.x type 目前有限制機型。

■ Not support EC2 to AWS online migration 不支援線上遷移,簡單來說 AWS online migration 本來就不支援多分片組的 Redis Cluster 故線上遷移的動作 。

!!! 在 MemoryDB 上,也因為寫操作在返回客戶端前,會同步至事務日誌中,因此可能會造成寫入的延遲略為升高,目前MemoryDB提供微秒(microsecond)的讀延遲,以及個位數的毫秒(millisecond)寫延遲。因此建議在MemoryDB上使用您的應用進行測試,來決定 MemoryDB 是否可以滿足業務需求。

!!! 但說實在的,若您的業務”完全”不充許資料遺失的話,建議您除了寫數據到內存型的數據庫如 MemoryDB、ElastiCache Redis外,也要另外寫一份數據到RDS或是其他服務,並且在業務邏輯上也必需考量到這樣的問題。

AutoScaling ElastiCache for Redis clusters

Amazon ElastiCache for Redis now supports auto scaling to automatically adjust capacity to maintain steady, predictable performance at the lowest possible cost. You can automatically scale your cluster horizontally by adding or removing shards or replica nodes. ElastiCache for Redis uses AWS Application Auto Scaling to manage scaling and Amazon CloudWatch metrics to determine when it is time to scale up or down.

使用重點(Key points):

使用者可以自行定義條件,來自動執行 Scaling in/out or up/Down、增加/減少副本節點(replica nodes),來符合業務需求。

■ Redis (cluster mode enabled) clusters running Redis engine version 6.x onwards。只支援 Cluster mode enabled 並且版本為 6.x。

■ Instance type families — R5, R6g, M5, M6g 只支援特定機型 (持續調整)。

■ Instance sizes — Large, XLarge, 2XLarge 只支援特定大小(持續調整) 。

■ Auto Scaling in ElastiCache for Redis is not supported for clusters running in Global datastores, Outposts or Local Zones. 不支援 Outposts、Local Zones。

■ Manually modifying Instance type to unsupported types. 不能換到: 不支援的機型。

■ Associating the replication group to a Global datastore. 不支援: Global datastore。

■ Changing ReserverMemoryPercent parameter. 不能改: 保留內存參數,

■ Manually increasing/decreasing shards/replicas beyond the Min,Max capacity configured during policy creation. 數量設置錯誤時,也會出現問題 。

!!! 請注意,若您不是”非常非常熟悉 Redis 運作原理”,也無法99%掌控您業務邏輯、業務程碌的時間行為的話,”強烈”不建議您使用AutoScaling ElastiCache for Redis clusters。沒有正確地去使用這個功能,是非常危險的。

Data Tiering

提供使用者具有成本效益的方案,他會自動搬移數據到本機 local SSD disk (Instance Store) 。

Q: How data tiering works?

On clusters with data tiering, ElastiCache monitors the last access time of every item it stores. When available memory (DRAM) is fully consumed, ElastiCache uses a least-recently used (LRU) algorithm to automatically move infrequently accessed items from memory to SSD. When data on SSD is subsequently accessed, ElastiCache automatically and asynchronously moves it back to memory before processing the request. If you have a workload that accesses only a subset of its data regularly, data tiering is an optimal way to scale your capacity cost-effectively.

ElastiCache for Redis stores data on NVMe SSDs using a purpose-built tiering engine, which is fine-tuned for high throughput and low latency. Security and data integrity were key areas of focus in the design of the tiering engine. Like all Graviton2-based hardware, ElastiCache R6gd nodes offer always-on 256-bit encrypted DRAM. Additionally, all items stored on NVMe SSDs are encrypted by default (even for clusters that didn’t configure encryption of data at rest) using an XTS-AES-256 block cipher implemented in a hardware module on the node. We perform data integrity validation using a crc32c checksum on each item read from NVMe SSDs.

Q: What performance can I expect when using clusters with data tiering?

Data tiering is designed to have minimal impact on application performance. Assuming 500-byte String values, you can expect an additional 300µs latency on average for requests to data stored on SSD compared to requests to data in memory.

使用重點(Key points):

■ You must use the Redis 6.2 or later engine. 目前只支援 6.2 以上的版本。

■ You cannot restore a backup of an r6gd cluster into another cluster unless it also uses r6gd. 目前只支援特定 r6gd 的機型,所以該備份檔只能還原到相同的機型。

■ You cannot export a backup to Amazon S3 for data-tiering clusters. 不支援匯出到s3上。

■ Online migration is not supported for clusters running on the r6gd node type. 不支援類型轉換 (Between data tiering supported and unsupported) 不能換到非 r6gd 的機型(因為其他機型沒有 local disk — Instance Store)。

Scaling is not supported from a data tiering cluster

Auto scaling is not supported for clusters running using data tiering.

■ Data tiering only supports volatile-lru, allkeys-lru and noeviction maxmemory policies. !!! 請注意,只支援特定的 maxmemroy-polich,並不是全部都支援喔。

Forkless save is not supported.

Items larger than 128 MiB are not moved to SSD. 大鍵值(bigeky)不會搬移。

!!! 請注意,使用這個功能也有需要考慮的,當您的鍵值長時間沒到”least-recently used (LRU) algorithm”,被搬到 local SSD disk 後,當您再次去存取該鍵值時,就會遇到”高延遲“,因為Data Tiering Engine 需要重新將該數據從 local SSD disk 再次搬回到內存中,再回覆前端的請求,此次該請求就會遇到”高延遲“,再加上 Redis 是單執行序服務,後面的命令會等待該命令完成,自然效能就會下降,這是您使用該功能必需要做的考慮。

Log Delivery

Log Delivery 可以持續串流 slowlog 及 Engine logs 到 Amazon CloudWatch Logs 或Amazon Kinesis Data Firehose,讓使用者可以進一步去分析特定時間點,是否有異常。

You enable and configure log delivery when you create or modify a cluster using ElastiCache APIs. Each log entry will be delivered to the specified destination in one of two formats: JSON or TEXT.

Log delivery lets you stream Redis slowlog /Engine logs to one of two destinations:

■ Amazon Kinesis Data Firehose

■ Amazon CloudWatch Logs

IAM權限

{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "VisualEditor0",
"Effect": "Allow",
"Action": [
"logs:CreateLogDelivery",
"logs:UpdateLogDelivery",
"logs:DeleteLogDelivery",
"logs:GetLogDelivery",
"logs:CreateLogGroup",
"logs:DescribeLogGroups",
"logs:DescribeResourcePolicies",
"logs:PutResourcePolicy",
"logs:ListLogDeliveries"
],
"Resource": "*"
}
]
}

Q: 如何在 ElastiCache 服務器端抓封包?

您可以使用”VPC 流日誌 (VPC Flow Log)”的功能,來紀錄網絡接口的 IP 流量的信息,並可以該紀錄 Amazon CloudWatch Logs 或 Amazon S3,以便日後分析。詳細請參考如下。

[+] VPC 流日誌 :
https://docs.aws.amazon.com/zh_cn/vpc/latest/userguide/flow-logs.html

利用 VPC 流日誌這項功能,您可以捕獲有關傳入和傳出您的 VPC 中網絡接口的 IP 流量的信息。可將流日誌數據發佈到 Amazon CloudWatch Logs 或 Amazon S3。創建流日誌後,您可以在選定目標中檢索和查看其數據。

流日誌 (VPC Flow Log)可幫助您處理多種任務,例如:
1) 診斷過於嚴格的安全組規則。
2) 監控達到您實例的流量。
3)確定在網絡接口上往返的流量的方向。

--

--

Jerry’s Notes
What’s next?

An cloud support engineer focus on troubleshooting with customer reported issue ,and cloud solution architecture.