Amazon DynamoDB 的 Consumed Units 注意事項

fcamel

Published in

fcamel的程式開發心得

6 min readNov 7, 2020

DynamoDB 是依 read/write 耗用多少資源收費，有許多眉角要留意，這篇記錄我覺得比較重要的部份。

TL; DR

舉個極端的例子說明「意外」花費有多驚人：

table 有 A、B、C、D 四個欄位，A 是 primary key，B 是 sort key。
A ~ C 各占 4 bytes，D 占 10 KB。
用 A、C 建 LSI，projection 是全部。
用 B、C 建 GSI，projection 是全部。

更新一筆資料的 B，會花多少 WCU 呢？直覺認為是 1 WCU，因為只更新了 4 bytes。但是實際花費是：

table: 11 (11 KB / 1KB)
LSI: 22 (delete + write)
GSI: 22 (delete + write)

共 55 WCU，是直覺的 55 倍貴。

原因如下：

所有計費都是以 item size (該筆資料總大小) 計算。局部更新、conditional write、使用 filter 讀資料等，都是用「接觸到」的總資料計費，不會比較省。
Read 以 4KB 為單位累進計費。Write 以 1KB 為單位。
更新 index 欄位 (GSI/LSI 的 primary/sort key) 時，計費兩次 (delete + write)。

其它注意事項：

使用 Provisioned mode + auto scaling 並且避免超過兩分鐘的 spike (短暫大幅上升的使用量)，花費只有 On-Demand mode 的 1/5。
Write 是 Read 五倍貴。
Consistent read 是 eventually consistent read 兩倍貴。
可使用 Contributor Insights 找 hot keys，但要額外收費。
長遠來說，自行用 returned consumed unit 計算是必要的。因為 web console 無法得知自家 application code 不同操作使用的 RCU/WCU；web console 也沒有顯示 LSI 的 RCU/WCU (table 和 LSI 合在 table metrics 裡)。

基本單位

基本的單位是 Read Consumed Unit (RCU) 和 Write Consumed Unit (WCU)，不同 AWS regions 收費不同，大致上 WCU 是 RCU 的五倍貴。可以在 AWS web console 的 metrics 看到詳細資訊，在 capacity 調整 RCU/WCU 使用的上下限。

粗略來說，可將每秒 RCU/WCU 上限看成每秒能處理 read/write 的上限。不過 eventually consistent read 只需花 0.5 RCU；consistent read 則花 1 RCU。

On-Demand vs. Provisioned Mode

使用 RCU/WCU 的模式有兩種：

On-Demand mode: DynamoDB 全程自動調整，可承受 4,000 WCU 而不會發生 throttled。可以隨時處理「上一個 peak 」的兩倍請求。最高可到 40,000 WCU。
Provisioned mode: 指定固定的數據 (例如 1000 RCU, 500 WCU)。在這個模式下可以加開 auto scaling，設定一個範圍和指定的使用率 (5% ~ 90%)，低於使用率太多就降 provisioned RCU/WCU，高於就升。

兩者收費的主要差異是 Provisioned mode 往上調沒有限制 (不管是手調或 auto scaling)，但每日往下調的次數有限 (例如四次)，超過次數後，變成每小時只能降一次。所以在使用量常有 spike (短暫大幅上升的使用量) 的情況，會有些尷尬。

如果沒開 auto scaling，spike 可能會被 throttled (後述)，導致操作失敗。
有開 auto scaling 時，會在偵測到 spike 維持一段時間後 (例如用量>70%維持兩分鐘)，自動往上調。但 spike 消失後卻要一小時才能調回合適的值。假設因為 spike 多加 1000 provisioned WCU，就浪費一小時的 1000 WCU。

Provisioned mode 看起來很麻煩，那為什麼不用 On-Demand mode 就好？原因是 On-Demand mode 費用約是 Provisioned mode 6.94x。若使用率定在 70% (target utilization percentage)，仍是貴了近五倍。若一小時內變化沒到五倍以上，用 provisioned mode 比較省。舉例來說，若一小時內 RCU 變化是 100 ~ 400，設 RCU=400 還是比 on-demand 便宜。

反之，多數情況沒什麼流量，偶而會有 spike，較適合用 On-Demand mode。

可以從 DynamoDB web console 的 metrics 看 throttled read/write requests/events。量不大時，不需擔心，因為 AWS SDK v2 會自動重試。但重試太多次後會丟出 provisioned throttled exception (web console 沒有統計此數量)，若 app 沒有處理，操作就失敗了。

在 On-Demand/Provisioned mode 之間切換不會立即有流量不夠的問題。唯一要留意的是 Provisioned mode 不足 4000 WCU 時，切到 On-Demand mode 要一陣子才會備妥 4000 WCU。

見官網 On-Demand Mode 在 “Initial Throughput for On-Demand Capacity Mode” 和 “Table Behavior while Switching Read/Write Capacity Mode” 的說明。

備妥機器的時間

若知道接下來會有大量讀寫時，實測最妥當的作法，是使用 provisioned mode 並關掉 auto scaling，手動指定數字，然後等 DynamoDB 備妥足夠機器。後台可從 metrics 看是否已備妥 provisioned capacity units。不過在使用量少的時候，metrics 更新不即時，要留意一下。等大量讀寫穩定發生後，再開啟 auto scaling，這樣等使用量降低後，DynamoDB 會自動降低 RCU/WCU 省錢。

On-demand 初始可容許 WCU=4,000、RCW=12,000。兩者模式互換時不用擔心影響 throughput：

Provisioned → On-demand: 用過去歷史記錄的 peak / 2 作為 on-demand 的基準。
On-demand → Provisioned: 後台會提供目前的 RCU/WCU 作為參考。

資料來源：

Newly created table with on-demand capacity mode: The previous peak is 2,000 write request units or 6,000 read request units. You can drive up to double the previous peak immediately, which enables newly created on-demand tables to serve up to 4,000 write request units or 12,000 read request units, or any linear combination of the two.
Existing table switched to on-demand capacity mode: The previous peak is half the maximum write capacity units and read capacity units provisioned since the table was created, or the settings for a newly created table with on-demand capacity mode, whichever is higher. In other words, your table will deliver at least as much throughput as it did prior to switching to on-demand capacity mode.

用 Provisioned mode + auto scaling 時，大致上要等個五分鐘，才會備妥機器。使用量上升速度太快時，就會不斷地有 throttled 然後等幾分鐘 provisioned consumed units 才上升。若想盡速處理完突來的需要，此時人工介入關掉 auto scaling 一次調足 RCU/WCU 會比較適當。

Adaptive Capacity

偶而出現大量讀寫不是問題，DynamoDB 使用 token bucket 計算 capacity units，Best Practices for Designing and Using Partition Keys Effectively 提到會保留五分鐘未用到的 capacity，供未來發生 burst 時使用：

DynamoDB provides some flexibility in your per-partition throughput provisioning by providing burst capacity. Whenever you’re not fully using a partition’s throughput, DynamoDB reserves a portion of that unused capacity for later bursts of throughput to handle usage spikes.
DynamoDB currently retains up to 5 minutes (300 seconds) of unused read and write capacity. During an occasional burst of read or write activity, these extra capacity units can be consumed quickly — even faster than the per-second provisioned throughput capacity that you’ve defined for your table.

在總量不變的前提下，會動態調整不同 partitions 用的 capacity，讓 hot partition 能多用一些：

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/bp-partition-key-design.html#bp-partition-key-throughput-bursting

若一直超出使用上限，會動態拆開 partition 成不同 partitions。最終單一 key 獨占一個 partition，然後一樣會有 3000 RCU / 1000 WCU 上限：

If your application drives consistently high traffic to a single item, adaptive capacity might rebalance your data such that a partition contains only that single, frequently accessed item. In this case, DynamoDB can deliver throughput up to the partition maximum of 3,000 RCUs or 1,000 WCUs to that single item’s primary key.

Global Secondary Index (GSI) 和 Local Secondary Index (LSI)

GSI 的計算是獨立的，使用 auto scaling 時可以選擇是否除了 table 以外，要一併對 GSI 啟用。LSI 則是和 table 用量合在一起計算，無法從 web console 得知。若想了解 LSI 實際用量，要自己用 API 計算 (後述)。

更新 index keys 時，GSI/LSI 都需要兩倍費用 (delete + write)。由於 table 無法更新 primary key 和 sort key，比較不會誤用「額外」花費。但更新到作為 index 的欄位時，可能沒有意識到，對 index 的花費是兩倍。

舉例來說， table 有個欄位 T 用來存 timestamp，建了一個 GSI 和一個 LSI，都用 T 當 sort key，projection 都是全部。更新 T 的時候，相比只有 table 沒有 index，會花五倍費用 (table update + GSI delete + GSI write + LSI delete + LSI write)。

Consumed Units 用 Item Size 計算

官方文件提到資料大小會影響 RCU (Read 和 Write 的單位是 4 KB 和 1 KB)：

Read capacity unit (RCU): Each API call to read data from your table is a read request. Read requests can be strongly consistent, eventually consistent, or transactional. For items up to 4 KB in size, one RCU can perform one strongly consistent read request per second. Items larger than 4 KB require additional RCUs. For items up to 4 KB in size, one RCU can perform two eventually consistent read requests per second. Transactional read requests require two RCUs to perform one read per second for items up to 4 KB. For example, a strongly consistent read of an 8 KB item would require two RCUs, an eventually consistent read of an 8 KB item would require one RCU, and a transactional read of an 8 KB item would require four RCUs. See Read Consistency for more details.

官方文件提到取部份欄位和取全部欄位的算法一樣：

DynamoDB calculates the number of read capacity units consumed based on item size, not on the amount of data that is returned to an application. For this reason, the number of capacity units consumed is the same whether you request all of the attributes (the default behavior) or just some of them (using a projection expression). The number is also the same whether or not you use a filter expression.

RCU 是依 item size 計算，不是傳回的大小。所以使用 filter 或是 item 內有某個欄位有很大的值，都會造成意外的花費。

比方說 item 有欄位 A, B, C, D, E，其中 A ~ D 很小，加起來不到 1 KB，但 E 占了 399 KB，但通常只會取出 A ~ D。結果是都會以使用 A ~ E (400KB) 的大小計算 RCU。

局部更新一個欄位，花費是以該筆資料整筆資料大小計算 (取更改前後的最大值)：

UpdateItem—Modifies a single item in the table. DynamoDB considers the size of the item as it appears before and after the update. The provisioned throughput consumed reflects the larger of these item sizes. Even if you update just a subset of the item's attributes, UpdateItem will still consume the full amount of provisioned throughput (the larger of the "before" and "after" item sizes).

以前面同個例子來說，更新 A 的結果，會花 400 WCU ( 400 KB / 1KB )。

結論是：同一筆資料不要混存很大的欄位和很小的欄位，這樣會讓讀寫小欄位使用意外多的費用。

Contributor Insights (Hot Key Issues)

網路上常有人提到用 DynamoDB 不表示就能 scale out。設計不當時，容易大量讀寫同一個 partition，以前沒提供工具讓開發者找 hot partitions，造成昂貴的開銷。

現在 DynamoDB 後台有提供付費新功能： Contributor Insights。可以分析 hot keys，讓開發者知道問題出在那。試用後滿有幫助的。對指定的 table 開啟後，會新增四張表：

Most accessed items (partition key) (rule: DynamoDBContributorInsights-PKT…)
Most throttled keys (partition key)
Most accessed items (partition key and sort keys)
Most throttled keys (partition key and sort keys)

下圖是官方文件的範例：

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/contributorinsights_HowItWorks.html

比較惱人的是 read 和 write capacity units 合在一起算，一個 write 算成三次，判讀是 hot read 或 hot write 時不太方便。

價格是以事件次數計費。事件的定義如下，注意 sort key 會多算一次：

For tables and global secondary indexes with CloudWatch Contributor Insights for DynamoDB enabled, each item that is written or read via a data plane operation represents one event.
If a table or global secondary index includes a sort key, each item that is read or written represents two events. This is because DynamoDB is identifying top contributors from separate time series: one for partitions keys only, and one for partition and sort key pairs.

價格不貴但也不算便宜，以 us-west-2 為例，每百萬筆事件收費美金 $0.03。使用後記得要關閉。

自行記錄 Consumed Units

DynamoDB web console 的 metrics 雖然已滿詳細了，但最終還是要自己記錄，才能回答以下問題：

自家 application code 不同操作各用了多少 RCU/WCU？
LSI 用了多少？
在一頁裡看所有 AWS regions 所有 tables 的異常用量。

作這件事並不難，用 AWS SDK 時，設定 ReturnConsumedCapacity: INDEXES 要求所有操作回傳 consumed units (這是 Query 的例子)，回傳值就會帶有 ConsumedCapacity。用 CapacityUnits 減去 GSI 和 LSI 的花費，就是 table 的花費。

以 Java 為例，具體作法是：

自訂一個 class 包住 SDK 提供的 DynamoDbAsyncClient。
定義用到的 methods，改成接受 “label” 和各種 request 的 builder。
用 builder 呼叫 returnConsumedCapacity(ReturnConsumedCapacity.INDEXES)再產生 request。
取得 response，取出 consumed units，配合 label 用 prometheus 記錄。

有個統一的 wrapper 呼叫 dynamo API，還可以順便記錄發生的 exceptions、執行的時間，藉此了解其它問題。