DynamoDB is the non-relational database service provided by Amazon Web Services (AWS), a fast, flexible and eventual consistent database solution for modern large-scale applications. Being an eventually consistent database system, DynamoDB relaxes the consistency in favor of availability and partition tolerance.
Even though DynamoDB is a NoSQL database system, it has significant differences when compared to other NoSQL database systems like MongoDB or Apache CouchDB. Unlike relational databases, it’s not an easy task to migrate data from DynamoDB to another NoSQL or SQL database system after your application has gone into production. DynamoDB being the only NoSQL database solution provided by AWS should not induce you to adopt the database for your application. Let’s dive into some important factors to consider before choosing DynamoDB.
Composite Primary Keys Can Only Contain a Maximum of 2 Attributes
Unlike relational databases, the composite primary keys for DynamoDB are defined as a combination of a partition key and a sort key. As a result, we cannot define a composite primary key with more than 2 attributes. If you don’t think about how your data is going to be stored at the designing phase, this can lead to some significant problems.
Say you have a table of data for a group of students. To uniquely identify a student, you need all 3 attributes — grade, class and name. But DynamoDB doesn’t allow you to store data in this format because of the primary key limitation. You’ll either need to concatenate grade and class as one field or introduce a unique ID as an additional field for each student.
Data Can be Queried Only with Keys and Indexes
In SQL Databases you can use any column to select data, while indexes make the selection operation even faster. In DynamoDB however, you can only use the primary key or the primary key with the sort key (you cannot use the sort key alone), and indexes to query data. If you want to search with an attribute other than a key or an index, you will need to scan through all the records of the table while performing a conditional check. This scan is performed at the database level and not the application level but will still take a significant amount of time depending on the table size.
To retrieve data in DynamoDB, you can execute two types of operations — scan and query. The query operation is similar to the select operation in SQL, which uses only indexes for the where clause. But there are some concerns worth mentioning here. In SQL you can use indexed columns for selection while using non-indexed columns for projection.
Let’s say ‘name’ is an index of the students’ table and ‘address’ is not an index. In this scenario, the following query doesn’t have any performance limitations:
select name, address from students table where name = “John”
The index of the name is used to find a pointer to the actual record and then the value of the address is read from the actual record.
This is because in SQL, the index is used to find a direct access pointer to the actual record or the bucket of records. But in DynamoDB, creating an index will result in creating a new table. We have to define what attributes to be projected to that newly created index table. We can set the index to project all attributes in the parent table but that will increase the cost as you have to pay for the storage in DynamoDB. But if you define only a selected set of attributes with the index for projection, you can retrieve only those set of attributes using that index. If you try to retrieve an attribute that is not projected to the index, it will perform a scan in the parent table, making your index useless. Therefore, you must carefully decide what the attributes to be indexed will be, and what attributes are to be projected with each index.
Querying Capabilities from the AWS Web Console are Limited
The AWS web console can be used to view the data of DynamoDB tables, but this console has very limited querying capabilities, making the development and testing tasks cumbersome. Some of the limitations are as follows:
1. Only 100 items are displayed once. If you want to check the 1000th item of a table, you must press the ‘next’ button 10 times until the 901st to the 1000th items are displayed.
2. Cannot insert/update multiple data items in one operation. You need to insert/update them individually.
3. Can only delete multiple items by ticking them with the checkbox. You cannot use queries/conditions to delete data in the console.
4. You can export existing data of the table to a CSV file, but there is no option to import that data again. And export is also possible only in batches of 100 items.
AWS provides a local client for DynamoDB. Unfortunately, this client cannot be connected to a table hosted in AWS. It works only with a locally running DynamoDB server. However, this is a good option to test and build your queries while still in development.
Auto Scaling Doesn’t Scale Well
The performance of a DynamoDB table is decided by the provisioned Read Capacity Units (RCU) and Write Capacity Units (WCU). If you want a higher IOPS rate, you need to increase these values at a higher cost. However, having higher RCU and WCU values is not practical in the long run with budget considerations. The solution provided by AWS for this concern is auto-scaling. With auto scaling, you can set the DynamoDB to vary provisioned read and write capacities dynamically depending on the load so that you can initially set a lower provisioned capacity and then scale it to a higher value with auto-scaling in order to cater peak loads.
With DynamoDB however, auto-scaling tends to be somewhat problematic. Even though you set scale up and scale down alarms, it won’t scale immediately to cater peak/burst loads. According to our investigations, it will take nearly 15 minutes to scale up. To further clarify, let’s say that the provisioned read capacity has been set to 25 and that a load test is started on the application which reads data from a DynamoDB table. Let’s also assume the concurrency of the load test is 50 and the load is continuous. You will notice that the requests will start to timeout. If you check the current provisioned capacity of the DynamoDB table in the AWS console, you will see that it’s still at 25 and auto-scaling has not been triggered yet. This behavior has been explained in detail here.
This is not actually a DynamoDB auto-scaling bug, but an expected behavior according to the AWS documentation.
DynamoDB Auto-Scaling is designed to accommodate request rates that vary in a somewhat predictable, generally periodic fashion. If you need to accommodate unpredictable bursts of read activity, you should use Auto-Scaling in combination with DAX (read Amazon DynamoDB Accelerator (DAX) — In-Memory Caching for Read-Intensive Workloads to learn more).
If your application needs to respond to burst loads you need to configure DAX for your DynamoDB tables as recommended, but there are some limitations associated with DAX as well. You can find a complete list here under usage notes. There is one annoying limitation which is worth mentioning here. A VPC must be assigned for the DAX cluster and the DAX can only be accessed from an EC2 instance running inside the same VPC as the DAX cluster. This means that you cannot access DAX from your development machine even though your VPC has public internet connectivity. This will make the development tasks almost impossible with DAX. DAX is also not included in the AWS java SDK out of the box and is not available as a maven dependency either, therefore, you have to install it as a separately downloaded jar.
I’ve covered the most salient points when it comes to DynamoDB limitations that need to be considered before adoption. This doesn’t mean you should avoid using DynamoDB entirely, but that you need to think carefully and design ahead if you are planning to use DynamoDB as the database for you next application.
Article by: Dulaj Atapattu — Software Engineer with passion, courage and a curiosity for knowledge.