DynamoDB mistakes we made (so you don’t have to)
The NoSQL approach has some learning curve to it and there are many things to remember to get it right. Here’s a list of mistakes and problems we faced when introducing the single-table DynamoDB approach to our microservices.
Deletion is an access pattern
We‘ve been very thorough while working on a DynamoDB data model design. We got the read and write access patterns noted down as suggested. We verified how to achieve them with the proposed model. Whenever we caught any limitation or problem we revised the model and started again. In the end, we decided to put the 7th version live.
After a while, we looked at the AWS billing only to realise we hadn’t paid enough attention to our data storage costs — they were growing way faster than the usage of our application.
What went wrong? Our application did not expose to its clients any way to delete entities, so we skipped it during our access pattern discussions. But it doesn’t mean we needed to keep all the data forever! For instance, we introduced a transactional-outbox-like pattern, saving an event payload together with an entity within a single DynamoDB transaction. Once the event was published or expired we didn’t need it anymore — the access pattern we totally missed.
What did we do about it? We revised the data we stored in the DynamoDB table from the perspective of:
- an expiration date— is this still useful or needed after a week? A month? A year?
- an environment — do we need to keep the data in development or test environments for the same amount of time as in production?
- expiration triggers — is there any business process that would notify us the retention period for our data is over? Think of removing the detailed order data after the order was successfully delivered, or a history of product price changes once the product is no longer offered to the customers.
Lessons learned
- Define time-to-live (TTL) whenever you know it upfront to limit the amount of unused data stored in your table.
- If your domain allows it, prefer using TTL over externally-triggered removal processes. You have to pay for the
DeleteItem
, which is a write action, whereas TTL-based removal comes for free! - If some of your data has to be kept for a long time, but it’d be accessed rather sporadically (i.e. auditing details) consider moving them to the dedicated Standard Infrequently Accessed table (Standard-IA).
Keep GSI names generic
DynamoDB comes with a hard limit of 20 Global Secondary Indexes (GSI) per table. It seems quite limiting at first, but knowing the GSI overloading technique, you can fit plenty of access patterns into a single table.
We didn’t quite get that while working on our first microservice backed by a single-table DynamoDB model. We defined our first GSI — orderContext
— with a hash key named orderContext
and a range key named contextTimestamp
.
Of course, the order-context
GSI’s structure is readable and self-explanatory to developers, but it does not make sense if you want to reuse it (overload it) with a different data set, and we’ve quickly realised that we need to add order item prices to the same table.
If we were to introduce a reverse mapping order-to-item, the existing order-context
GSI would not make sense — see the orderContextTimestamp column
containing an item id?
It may look like a minor problem, but we’ve effectively wasted one of the 20 available GSIs. Having it named in a more generic way does not suggest any characteristics of the data collection underneath. It allows future additions and leaves the complexity on the application side.
Lessons learned
- use generic GSI hash key and range key names, i.e.
GSI_1_PK
andGSI_1_SK
to make them reusable (overridable) in the future.
GSI updates in baby steps
We use CloudFormation to define the structure of our DynamoDB tables. It’s just convenient to keep the “schema” details, or — I should say — the infrastructure defined as code (IaC).
At times where we had to update the GSI structure by, for example, introducing more projected attributes to it or adding a range key. Unfortunately, in DynamoDB terms, it means dropping the existing GSI, recreating it from scratch, waiting for the data to be backfilled, and voilà, you’re done.
Remember that you can only create or delete one GSI at a time — a thing we forgot about multiple times… Otherwise, you’ll see the following error during the CloudFormation stack update:
UPDATE_FAILED - AWS::DynamoDB::Table - someTable - Cannot perform more than one GSI creation or deletion in a single update
Lessons learned
- Always think through the structure of your GSIs, including the projected attributes. In case you missed anything, remember to change only one GSI at a time.
- When using CloudFormation it’s important to define only the attributes that are used in a table or GSI key schema (either as PK or SK) in the
AttributeDefinition
section. Any extra attribute may cause an exception. - To avoid any “downtime” issues because of a missing GSI, you try creating a new one with a different name and desired structure first, deploy code changes to use it in your application, and finally trigger the removal of the old one.
GSI storage amplification
DynamoDB comes with 3 types of attribute projection in Global Secondary Indexes: ALL
, INCLUDE
, and KEYS_ONLY
. Whenever you define a new GSI think carefully about what data it needs to store. The thing to remember is that data for each projected field in each GSI you define will be copied, which effectively multiplies the cost of its storage.
Imagine a table of orders, including an order id, customer id, some timestamp, and a heavy JSON payload describing all the details of products, prices, and promotions:
If we define a GSI to get orders from a given customer with the projection type set to ALL
, we will have a table like this:
Notice that it contains the payload
attribute, which means you’ll pay for the heavy payload storage twice — once to store it in the main table, and another one to keep it in the GSI.
If we used the projection type set to KEYS_ONLY
, it’d look as follows:
The GSI is still quite useful, but the heavy JSON payload is stored only once, limiting the storage costs.
Lessons learned
- Do your math (i.e. using AWS Pricing Calculator) and verify if it’s worth storing the same data multiple times, or rather it’s better to use 2 separate reads — one to get the main entity’s key from the GSI, and another to get the full payload.
- Refrain from using
ALL
attributes projection, or including data-heavy attributes inINCLUDE
projection to cut down storage costs. - The smaller the set of attributes in a GSI, the faster it is to populate and provides lower latencies.
- The smaller the index, the lower the costs of storage and writes.
GSIs have their limits too
The AWS docs recommend using uniformly distributed keys in your DynamoDB tables for optimal performance — think of UUIDs rather than a limited set of order statuses.
If you don’t spread your workload evenly you may run into the “hot” partitions problem, where certain data is read/written so frequently the requests exceed the throughput limits of 3000 RCU/s or 1000 WCU/s and get throttled.
We knew it when we built a write-heavy application in my team. We decided to use a compound key consisting of a few UUIDs — product id and tenant id —as the PK of our main table to spread data evenly.
What we overlooked though is that GSIs have their throughput limits too! And we did not distribute the partitions as well as in the main table:
Whenever a single tenant started to define a lot of product prices at the same time, the main table worked well, but we faced write request throttling for GSI.
Bear in mind that whereas read request throttling in GSI does not affect the main table, write requests throttling does as it fails the writes on the main table too!
Lessons learned
- Try distributing all your partition keys evenly, including the main table and all its GSIs.
- While designing a data model, always verify your GSIs for “hot” partition problems. Consider throughput limits for both reads
(3000 RCU/s) and writes (1000 WCU/s). - If your key space is small, consider adding artificial suffixes to expand the possible set of values.
- Insufficient write capacity in a GSI (throttled writes) causes failures on writes in the main table and any of its GSIs.
- Insufficient read capacity in a GSI (throttled reads) does not affect the main table or other GSIs.