As the buzz around DynamoDB is getting louder and louder, more teams are looking at this with more curiosity. This post is meant to be the first in a series aimed at exploring DynamoDB in more detail.
But, before heading over to any of the more intensive topics, it is important to get accustomed to the terminologies used within the DynamoDB world. Below is a graphical representation of a DynamoDB table.
As can be seen in the above image:
- Every DynamoDB Table is divided in to one or more Partitions
- Each Partition contains a subset of the table data, in addition to any Local Secondary Indexes* created on that Partition data
- Global Secondary Indexes are stored and maintained separately from the Partitions and they index the entire table (not specific to any one Partition)
* DynamoDB internals are not visible to public. So, the internal partition structure shown above is only an illustration intended to explain the observed behaviour
A more detailed image, showing the view within a Partition, can be found below.
As per the above image:
- Each Table consists of a number of Items
- Each Item can contain one or more Attributes
- Every Item must contain at least one Attribute, which will be its Partition Key
- For every CRUD operation on the table, the operation must specify the Item’s Partition Key
- In addition to the Partition Key, a Table definition can assign any Attribute as its Sort Key
- Read operations on the Table can specify Sort Key, for more advanced queries (“contains”, “begins with” etc.,)
- A group of Items in table is called an Item Collection
DynamoDB is a Hosted Service. Amazon hosts all the infrastructure required to run an instance of DynamoDB for its clients. In return, the clients have to pay for the following:
- Every read request
- Every write request
- Every byte of storage used
The currency for these requests are RCU (Read Capacity Units) and WCU (Write Capacity Units). In general, a DynamoDB table needs to have a certain RCU/WCU provisioned* at the time of creation (can be modified anytime later too) and the table can serve as many reads/writes as needed as long as the provisioned RCU/WCU is not exhausted.
* This is only if you are using the “Provisioned Capacity Mode”. Since Nov 2018, Amazon has introduced “On-demand Capacity Mode” for unpredictable workloads. The differentiation is out of the scope of this story. Please refer to post #3 of this series, if you are interested in learning more about the difference between “Provisioned” and “On-Demand” Capacity modes.
Thanks to Kirk Kirkconnell for pointing this out
Now, a number of these terminologies and the overall table structure itself might seem very familiar to Cassandra users. Teams, that already use Cassandra might wonder, “Why DynamoDB then?”
This will be explained in greater detail in Part 2 of this series — “Why migrate to DynamoDB from Cassandra?”