Introduction to AWS DynamoDB
DynamoDB is a low latency NoSQL Database service offered by AWS.
What is DynamoDB
DynamoDB is a fast and configurable NoSQL database particularly for applications seeking low latency, high throughput and unstructured data storage.
DynamoDB can store documents as well as key-value data models. Its flexible data modeling, high consistency and reliable performance makes it an ideal choice for IoT applications, Mobile gaming and other similar applications seeking high throughput.
DynamoDB features ACID Transactions (Atomic, Consistent, Isolated and Durable).
Behind the scene, DynamoDB is a server-less which means it offloads the requirements of server management or hardware management. Its self maintained distributed system does the heavy lifting for us.
Every table in DynamoDB is schema-less which means we need not to define schema beforehand at table creation. Data in table can be stored in JSON, XML or HTML format.
Indexing
There are two types of indexing in DynamoDB
Primary key :
It can be distinguished into two categories
- Partition Key(pk): This is a unique key defined during table formation. Just like in SQL structured tables, Partition key acts as inputs to hash functions and output of which defines the physical location of each entity in a table and this is how data is fetched at O(1) time complexity. Partition Keys are unique in a table.
- Composite key: Combination of Partition key(pk) and Sort key(hk). We can define Sort key during table creation, if selected partition key need not to be unique but combination of pk and hk have to form a unique duo. Here again pk is pushed to hash function and output of which defines physical location of each entity stored but in case of same partition key defines same physical location on hard disk, sort key comes into play by defining priority of each entity and hence locations are decided.
In above tabular data, Physical address of first entity comes first.
Secondary Indexes:
In addition to querying DB on Primary keys, Secondary Indexes enables to query DB on alternate key for more precise filtered results.
These Indexes are of two types:
- Global Secondary Indexes (max limit 20):
- Can be created and modified at any point of time.
- User can able to select different partition key and sort key wrt main table attributes.
- With new combination of pk and hk queries perform in much efficient way.
2. Local secondary indexes (max limit 5):
- Can only be created while table creation. No modifications are allowed later.
- Same Partition Key as of Table but different Sort key.
- Query made on this sort key are quite efficient compared to original sort key of main table.
Lets understand by Example:
Suppose we have a Cars Table having carId as Partition Key and CarName as Sort Key (hk), and we want to Query using carName and company.
We then make new indexes as CarNameCompany by setting carName as Partition key and company as sort key and query the DB as per required.
There is One To One mapping of each indexes with Original Table and any entity modification in Original Table is evenly synchronized with Indexes.
Streams
Streams are for DynamoDB tables. It works like a subscription. It means if enabled, for any DynamoDB table we can capture events related to that particular table. Captured events may include events related to:
- Any new entity addition.
- Any existing entity deletion.
- Entity update.
Streams allow us to further trigger any other AWS service based on nature of event occurred.
Say Pushing Welcome notification to a new user on successful signup.
Throughput
DynamoDB provisioned throughput is measured in Read Capacity units and Write Capacity Units and is defined during table creation.
Before going further we should understand difference between Eventually Consistent Reads and Strongly Consistent Reads.
- 1 Write Capacity Unit = 1 x 1KB Write per Second
- Specifying Capacity is not necessary in On-Demand Capacity Pricing Model. It auto-scales as per incoming traffic.
- On-Demand Capacity Model is preferred over Provisioned Capacity Model for server-less applications which offers pay-per-request feature.
DynamoDB Accelerator (DAX)
DAX is a managed in-memory cache for DynamoDB. Significant performance enhancements can be observed i.e up to single digit microseconds response time for Eventually Consistent Reads. It can be easily deployed and best suited for read-heavy and busty data packet transfers.
Along with replicating data to all read replicas, DAX also write data in encrypted way to disk for instant retrievals. DAX cluster sits between Application and DynamoDB and hence checks for cache-hit or cache miss requests.
Important Chunks
- DynamoDB TTL (Time To Live) feature, if activated will delete any stale entities. Entity is marked for deletion for up to 48 Hrs if current time crosses TTL attribute.
- Query operation should be preferred over Scan for fetching DB data. Performance optimization of scan read operation can be observer by setting smaller page size per API call executed behind the curtains by SDK itself.
- Provisioned Throughput Exception is observed if incoming rate of requests exceeded provisioned throughput.
- Exponential Back off: Mainly used in case of Provisioned Throughput exception, Here time for retry transaction calls to DB will get increased exponentially, if an exception is observed in previous call. There will be progressively longer waits between consecutive retries. For ex: 50ms, 100ms , 200ms up to 1 min.
Summary
- DynamoDB supports storage of key value data model as well documents like JSON, XML and HTML
- Types of Primary Keys: Partition Key(pk) and Composite key (Partition key and Sort Key)
- Provisional Capacity Units should be configured cautiously to avoid Provisioned Throughput Exception.
- DAX can be configured for In-Memory caching for read intensive applications.
Reach me on LinkedIn