How to Calculate a DynamoDB Item’s Size and Consumed Capacity
Read and Write Capacity Units are one of DynamoDB’s defining features. For new and experienced users alike, there is sometimes uncertainty around what capacity units are, how they’re consumed, and how to determine an item’s size. In this post, I’ll answer those questions and give you an item size calculator to add to your toolkit.
What are Read and Write Capacity Units?
Often abbreviated to RCUs and WCUs, capacity units are the primary measurement on which DynamoDB is priced. Read requests like
GetItem are measured in RCUs, while write requests like
PutItem are measured in WCUs.
DynamoDB offers two capacity modes. The first is Provisioned Capacity where you configure how many units you want to have available each second. If you use more, your excess requests will be throttled and fail. You’re billed for both used and unused units.
Alternatively, as of reInvent 2018, you can use On-Demand Capacity to pay for only the RCUs and WCUs you actually use. This mode can reduce your bill even though each request costs more.
How many units will each request consume?
This changes based on the size of the item(s) being read or written. You can calculate an item’s size using the rules below, or you can see how many units were consumed by setting the
ReturnConsumedCapacity property on your requests.
Note that I’ll be using the KB suffix to denote 1,024 bytes.
An eventually-consistent read (the default type), will use 0.5 RCUs for every 4,096 (4 KB) or part thereof. The only thing that changes for strongly-consistent reads is that they use 1 RCU per 4 KB (twice as much). Items can be up to 400 KB, so reads can range from 0.5 to 100 RCUs.
When requesting items that don’t exist,
GetItem will still use the minimum 0.5 or 1 RCU (depending on the consistency model being used).
Writes use 1 WCU for every 1,024 bytes (1 KB) or part thereof. Again, items can be up to 400 KB, so writes can range from 1 to 400 WCUs.
There are a handful of operation-specific behaviours below.
When overwriting items (
PutItem), the size will be the larger of the new and old versions. For example, replacing a 2 KB item with a 1 KB one will consume 2 WCUs. Subsequent requests will only use 1 WCU.
When modifying items (
UpdateItem), the size includes all of the item’s pre-existing attributes, not just the ones being added or updated.
When deleting items (
DeleteItem), the size is that of the item being deleted. If the item doesn’t exist, the request will use 1 WCU. Deletes via Time To Live don’t consume any WCUs.
Any request with a conditional expression will consume the same number of WCUs, following the above rules, regardless of whether the condition evaluates to true or false.
The total RCUs or WCUs consumed by a batched request is simply the sum of those used by each individual request.
BatchGetItem operation can contain up to 100 individual
GetItem requests and can retrieve up to 16 MB of data. In addition, a
BatchGetItem operation can retrieve items from multiple tables.
BatchWriteItem operation can contain up to 25 individual
DeleteItem requests and can write up to 16 MB of data. (The maximum size of an individual item is 400 KB.) In addition, a
BatchWriteItem operation can put or delete items in multiple tables.
BatchWriteItem does not support
When using transactions, DynamoDB performs two underlying reads or writes of every item in the transaction: one to prepare the transaction and one to commit it.
Transactional reads use 2 RCUs per 4 KB or part thereof, which is double a normal strongly-consistent read. Writes use 2 RCUs per 1 KB or part thereof, which is double a normal write. The total units consumed by a transactional request is the sum of those used by each individual request. It’s essentially double the size of a batch request (with strongly-consistent reads).
DynamoDB Accelerator (DAX)
DAX is an API-compatible, in-memory cache for DynamoDB. The number of RCUs and WCUs consumed when using DAX is the same as above with some exceptions.
Non-transactional reads served from DAX’s cache don’t consume RCUs. When an item is not in the cache, DAX will perform a strongly-consistent read, consuming 1 RCU per 4 KB or part thereof. Transactional reads done with
TransactGetItems are always passed-through to DynamoDB and consume RCUs as if you called DynamoDB directly.
Non-transactional writes are always passed-through to DynamoDB and consume WCUs as if you called DynamoDB directly. Transactional writes, however, will also consume RCUs because DAX calls
TransactGetItems in the background for each item in the
TransactWriteItems operation. As an example, a
TransactWriteItems request containing three 200 byte items will consume 6 WCUs and 6 RCUs.
How do you determine a DynamoDB item’s size?
As you know, items are made up of attributes. An item’s size is the sum of all its attributes’ sizes, including the hash and range key attributes.
Attributes themselves have a name and a value. Both the name and value contribute to an attribute’s size. Attribute names are strings and are sized in the same way as string values (see String and StringSet). Below is a list of all the data types and the way their sizes are calculated.
String and StringSet (and attribute names)
In DynamoDB, Strings are Unicode with UTF-8 binary encoding. This means that each character uses 1 to 4 bytes. Note that strings can’t be empty.
The English alphabet, numbers, punctuation and common symbols (
%, etc.) are all 1 byte each. However, the pound sign (
£) is 2 bytes!
Languages like German and Cyrillic are also 2 bytes, while Japanese is 3 bytes. On the top end, emojis are a whopping 4 bytes each 😲!
A StringSet is a collection of strings. To get the total size you simply sum up the sizes of each string in the set. Sets can’t be empty.
Number and NumberSet
This is easily the most complicated type. AWS does not publicly document how to determine how many bytes are in a number. They say this is so they can change the internal implementation without anyone being tied to it. What they do say, however, sounds simple but is more complicated in practice.
The size of a number is approximately (length of attribute name) + (1 byte per two significant digits) + (1 byte).
Numbers can have up to 38 significant digits and use between 2 and 21 bytes. All the code I’ve seen on GitHub under AWS Labs simply assumes all numbers are 21 bytes. I, however, just spent a week painstakingly reverse engineering and testing an algorithm that gives the correct size. The calculator at the end of this post uses that algorithm.
Very roughly, though, the formula is something like 1 byte for every 2 significant digits, plus 1 extra byte for positive numbers or 2 for negative numbers. Therefore,
27 is 2 bytes and
-27 is 3 bytes. DynamoDB will round up if there’s an uneven amount of digits, so
461 will use 3 bytes (including the extra byte). Leading and trailing zeros are trimmed before calculating the size.
A NumberSet is a collection of numbers. To get the total size you simply sum up the sizes of each number in the set. Sets can’t be empty.
Binary and BinarySet
The Binary type is just an array of unsigned bytes. That makes things very easy because 1 byte uses 1 byte. Therefore, the size of the value is just number of bytes in the array. Of course, you must Base64-encode the bytes before calling the API.
A BinarySet is a collection of binary values. To get the total size you simply sum up the sizes of each binary value in the set. Sets can’t be empty.
The Boolean type can be
false and uses 1 byte either way.
Even though Null indicates the absence of data, it still uses 1 byte and displays as
true in the console.
A Map is similar to what some programming languages call a hash, dictionary, or also a map. It’s an unordered collection of key-value pairs with unique keys. All maps use 3 bytes, plus the size of each key-value pair. This means empty maps still use 3 bytes.
A key-value pair has three parts that make up its size. Firstly, the key is sized the same as other strings as its just a string. Secondly, the value is sized based on its data type. Map values don’t have to all be the same type. Finally, each key-value pair adds an extra 1 byte.
A List is an ordered collection of values similar to an array. The elements can be any data type and don’t need to be of the same type.
All lists use 3 bytes, plus the size of each element (based on the element’s type). This means empty lists still use 3 bytes. There is also an extra 1 byte used for each element in the list.
Using this knowledge for good
There are two main things I think you should take away from this post.
- Being aware of item sizes can help avoid using excess capacity units by being just a few bytes over a limit.
- Understanding how different requests consume capacity units can help avoid using more than expected.
Making items smaller
To make items smaller, the easiest thing to do is give your attributes shorter names. For example, use
dob instead of
loc instead of
Item Size and Consumed Capacity Calculator
This tool takes the JSON representation of a DynamoDB item and tells you its size in bytes and how many capacity units it’ll consume when reading and writing.
The tool is on GitHub: https://zaccharles.github.io/dynamodb-calculator/
To use the tool, paste an item’s DynamoDB JSON into the text area below and click Calculate. This is a client-side tool, so your data stays in the browser.
You can get the DynamoDB JSON of an item using the AWS console by clicking the item’s key, switching to Text view, and ensuring DynamoDB JSON is checked.