AWS(Amazon Web Services) DynamoDB — Roadmap

DynamoDB is a NoSQL database provided by Amazon Web Services. It is a database that has been used within Amazon for years. One of the NoSQL database types, it can be said to be of key-value type. In a large table, the hash is kept in the form of values.

Schemaless
All NoSQL databases of key-value type are designed without a schema. Since it is schematically, hash information is kept in the same table.

Consistency
Most NoSQL databases do not support ACID (Atomicity, Consistency, Isolation, Durability). There are two different solutions for consistency in DynamoDB. (Eventually Consistent and Strongly Consistent)

There is a possibility that Eventually Consistent may not read any recent updates while reading the data. In the DynamoDB documentation, the new update is expressed as the updates made within 1 second. Strongly Consistent is defined as reading the final version of the data at each reading. Reading you make Strongly Consistent costs more than Eventually Consistent.

While it is not possible to expand the Relational Database (RDS) after a point, DynamoDB can scale up / down unlimitedly without causing down-time by calculating Provisioned Capacity.

DynamoDB holds replica on three different A-Zs. One of them is the primary replica. If you want to read data from the primary replica, this is Strong Consistency. If you want to read data from replicas other than primary, this becomes Eventual Consistency. Because there are write operations on the primary, the accuracy of the data is certain. The data is imprecise as others may not have received the update yet. According to the Eventual Consistency, which is Strong Consistency, it needs twice the capacity. Therefore, if there is no need for consistency in queries to be made, if ms delays are not a problem, choosing Eventual Consistency will provide cost-saving. Default: Eventual Consistency.

Scalability
The predicted number of reads and writes per second is requested before scalability in DynamoDB. Based on this information, it decides how much resources it should use, how it will distribute DynamoDB tables to the servers, and it is charged.

A ThrottlingException error will be received if more seconds of reading and writing is performed in the predicted number of reads and writes.

Partition Key
Although it is called a key-value store for DynamoDB, it also includes document store-like features due to schemaless.
The reason it is called Partition Key is that this information also determines how the information will be distributed. Under DynamoDB scalability, it is sharding. While sharding, it decides which data to distribute to which server through this key.

DynamoDB keeps data in partition form. Some partition kept data is considered as Hot Partition due to its frequent use. General capacity is divided and given to each partition at the same level. However, some are summoned very often, while some are less frequently summoned. This causes some to consume their capacity, while others create a capacity that they do not use. To solve this problem, capacity management is provided between Adaptive Capacity and partition.

With the TTL feature, data can be exported after a specified period of time. DynamoDB keeps all capacity partitioned. If there are many partitions, the capacity per partition will decrease. At this point, TTL provides serious benefits and allows the partitions that are less used to be transferred to a different table. It can offer a high capacity table for favorites, and a low capacity table for lesser users. Thus, the cost of infinite capacity increase is eliminated.

Sort Key
Although it is possible to search over every field of the tables, in addition to the Hash Key, you need a Sort Key to make a quick search.

Primary Key
PrimaryKey (PK) expresses singular information like in RDB (Relational Database). In fact, it is not possible to define PK directly in DynamoDB. PK in a DynamoDB table can be of two types:
Partition Key
Partition Key + Sort Key

Secondary Indexes
It is possible to add additional fields for quick searches. But be careful while doing this, according to the index type you defined, additional tables are created for the index. This situation is reflected in the cost.
When defining DynamoDB tables, the index structure should be thought through. Since Local Secondary Index is an index that needs to be created when creating a table, it should be considered specially. Global Secondary Index needs a new capacity instead of the basic capacity of the table. For this reason, it should not be defined if it is not needed because it increases the cost.

Provisioned Throughput / Capacity Units
Provisioned Throughput, the projected load, is a common issue both in scaling and pricing. If it is set low, the load will be exceeded frequently and will not be able to take full advantage of DynamoDB. If it is set higher than necessary, the cost will increase.

With Capacity Units, it is calculated according to the size of the information you store in DynamoDB. The dimensions you use vary according to the reading and writing process.

If the capacity determination is not made incorrectly or regularly, it will increase the cost. To avoid this problem, AWS DynamoDB auto-scaling feature allows automatic scale-up / down if the capacity is used above a defined value for a certain period of time.

Query operators should be used instead of the scan operator. While performing operations between Scan partitions, Query only operates in one partition. By using the Query operator, both speed gain and Capacity Unit gain are achieved.

Due to the DynamoDB NoSQL structure, complex queries can be avoided by using Redshift. Especially in order to use it on our reporting side, DynamoDB data at a certain moment can be taken from DynamoDB and taken into Redshift. SQL queries can be written on the data and reports can be generated.

With Redshift, SQL operation is performed only on the copied data, while SQL operation can be performed on real-time DynamoDB data with AWS EMR. AWS EMR is managed by Apache Hadoop Cluster. It uses Apache Hive Datawarehouse running on Hadoop. In this way, it is possible to query real-time data with SQL.

Read
Rows read <4 KB: 1 RCU for every strongly consistent read per second or every 2 eventually consistent reads
Read line> 4KB: If the line or data to be read is 8.1 KB, 3 RCUs must be spent.

Write
Data to be written <1KB: Each 1KB write operation per second is 1 WCU
Data to be written> 1KB: If the line or data to be written is 8.1 KB, 9 WCU should be spent.

One of the DynamoDB constraints is that the size of data to be kept as an attribute is a maximum of 400KB. For the need to store attributes above 400KB, it can be done in three ways:

1.Downsizing with GZIP
2.Put on S3 Bucket and keep the link DynamoDB
3. Attribute different partition and partition with sort key

Using DAX (DynamoDB Accelerator) (working with Eventual Consistency), DynamoDB solves the Hot Partition problem by putting its own cache system in front of it. In other words, the partition is able to receive data via DAX without consuming its own capacity.

It is imperative to design that DAX and DAX are on the same VPC. When the DAX is created, if the default cache retention time is not changed for 5 minutes, it expires in 5 minutes and is deleted.

With DynamoDB Streams, Lambda can be triggered automatically on any change.

Apart from regular backups that can be taken manually or with a lamp, the feature called Point in Time Recovery can return DynamoDB to the desired time up to 35 days. It just needs to be active and automatically manages.
Using the Global Table feature, it is possible to work in multiple regions simultaneously. When it collapses in the Europe (Frankfurt) region, it continues to serve through another replica located in the Asia Pacific (Tokyo).
DynamoDB dump data can be exported to Amazon S3 Bucket using the AWS Data Pipeline. Similarly, it can import dump data from Amazon S3 and easily create a new table.

Pricing
Free Tier
Free Tier Amazon DynamoDB offers 25 GB, 25 RCUs, and 25 WCUs. This service does not stop at the end of the year.

Paid Tier

See you in my next article …

Burak Tahtacıoğlu

Computer Engineer — Burak Tahtacıoğlu

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store