Sitemap
codeburst

Bursts of code to power through your day. Web Development articles, tutorials, and news.

DynamoDB mistakes we made (so you don’t have to)

--

The NoSQL approach has some learning curve to it and there are many things to remember to get it right. Here’s a list of mistakes and problems we faced when introducing the single-table DynamoDB approach to our microservices.

Deletion is an access pattern

We‘ve been very thorough while working on a DynamoDB data model design. We got the read and write access patterns noted down as suggested. We verified how to achieve them with the proposed model. Whenever we caught any limitation or problem we revised the model and started again. In the end, we decided to put the 7th version live.

After a while, we looked at the AWS billing only to realise we hadn’t paid enough attention to our data storage costs — they were growing way faster than the usage of our application.

What went wrong? Our application did not expose to its clients any way to delete entities, so we skipped it during our access pattern discussions. But it doesn’t mean we needed to keep all the data forever! For instance, we introduced a transactional-outbox-like pattern, saving an event payload together with an entity within a single DynamoDB transaction. Once the event was published or expired we didn’t need it anymore — the access pattern we totally missed.

What did we do about it? We revised the data we stored in the DynamoDB table from the perspective of:

  • an expiration date— is this still useful or needed after a week? A month? A year?
  • an environment — do we need to keep the data in development or test environments for the same amount of time as in production?
  • expiration triggers — is there any business process that would notify us the retention period for our data is over? Think of removing the detailed order data after the order was successfully delivered, or a history of product price changes once the product is no longer offered to the customers.

Lessons learned

  • Define time-to-live (TTL) whenever you know it upfront to limit the amount of unused data stored in your table.
  • If your domain allows it, prefer using TTL over externally-triggered removal processes. You have to pay for theDeleteItem , which is a write action, whereas TTL-based removal comes for free!
  • If some of your data has to be kept for a long time, but it’d be accessed rather sporadically (i.e. auditing details) consider moving them to the dedicated Standard Infrequently Accessed table (Standard-IA).

Keep GSI names generic

DynamoDB comes with a hard limit of 20 Global Secondary Indexes (GSI) per table. It seems quite limiting at first, but knowing the GSI overloading technique, you can fit plenty of access patterns into a single table.

We didn’t quite get that while working on our first microservice backed by a single-table DynamoDB model. We defined our first GSI — orderContext— with a hash key named orderContext and a range key named contextTimestamp.

Single table holding order prices (ids simplified for readability)
Order-context GSI representing an order-to-prices view

Of course, the order-context GSI’s structure is readable and self-explanatory to developers, but it does not make sense if you want to reuse it (overload it) with a different data set, and we’ve quickly realised that we need to add order item prices to the same table.

Single table holding order and item prices

If we were to introduce a reverse mapping order-to-item, the existing order-context GSI would not make sense — see the orderContextTimestamp column containing an item id?

Order-context GSI with order and item prices

It may look like a minor problem, but we’ve effectively wasted one of the 20 available GSIs. Having it named in a more generic way does not suggest any characteristics of the data collection underneath. It allows future additions and leaves the complexity on the application side.

Lessons learned

  • use generic GSI hash key and range key names, i.e. GSI_1_PK and GSI_1_SK to make them reusable (overridable) in the future.

GSI updates in baby steps

We use CloudFormation to define the structure of our DynamoDB tables. It’s just convenient to keep the “schema” details, or — I should say — the infrastructure defined as code (IaC).

At times where we had to update the GSI structure by, for example, introducing more projected attributes to it or adding a range key. Unfortunately, in DynamoDB terms, it means dropping the existing GSI, recreating it from scratch, waiting for the data to be backfilled, and voilà, you’re done.

Remember that you can only create or delete one GSI at a time — a thing we forgot about multiple times… Otherwise, you’ll see the following error during the CloudFormation stack update:

UPDATE_FAILED - AWS::DynamoDB::Table - someTable - Cannot perform more than one GSI creation or deletion in a single update

Lessons learned

  • Always think through the structure of your GSIs, including the projected attributes. In case you missed anything, remember to change only one GSI at a time.
  • When using CloudFormation it’s important to define only the attributes that are used in a table or GSI key schema (either as PK or SK) in the AttributeDefinition section. Any extra attribute may cause an exception.
  • To avoid any “downtime” issues because of a missing GSI, you try creating a new one with a different name and desired structure first, deploy code changes to use it in your application, and finally trigger the removal of the old one.

GSI storage amplification

DynamoDB comes with 3 types of attribute projection in Global Secondary Indexes: ALL, INCLUDE, and KEYS_ONLY. Whenever you define a new GSI think carefully about what data it needs to store. The thing to remember is that data for each projected field in each GSI you define will be copied, which effectively multiplies the cost of its storage.

Imagine a table of orders, including an order id, customer id, some timestamp, and a heavy JSON payload describing all the details of products, prices, and promotions:

Orders main table (ids simplified for readability)

If we define a GSI to get orders from a given customer with the projection type set to ALL, we will have a table like this:

Customer orders GSI with ALL attributes projected to it

Notice that it contains the payload attribute, which means you’ll pay for the heavy payload storage twice — once to store it in the main table, and another one to keep it in the GSI.

If we used the projection type set to KEYS_ONLY, it’d look as follows:

Customer orders GSI with KEYS_ONLY projected to it

The GSI is still quite useful, but the heavy JSON payload is stored only once, limiting the storage costs.

Lessons learned

  • Do your math (i.e. using AWS Pricing Calculator) and verify if it’s worth storing the same data multiple times, or rather it’s better to use 2 separate reads — one to get the main entity’s key from the GSI, and another to get the full payload.
  • Refrain from using ALL attributes projection, or including data-heavy attributes in INCLUDE projection to cut down storage costs.
  • The smaller the set of attributes in a GSI, the faster it is to populate and provides lower latencies.
  • The smaller the index, the lower the costs of storage and writes.

GSIs have their limits too

The AWS docs recommend using uniformly distributed keys in your DynamoDB tables for optimal performance — think of UUIDs rather than a limited set of order statuses.

If you don’t spread your workload evenly you may run into the “hot” partitions problem, where certain data is read/written so frequently the requests exceed the throughput limits of 3000 RCU/s or 1000 WCU/s and get throttled.

We knew it when we built a write-heavy application in my team. We decided to use a compound key consisting of a few UUIDs — product id and tenant id —as the PK of our main table to spread data evenly.

Product table (ids simplified for readability)

What we overlooked though is that GSIs have their throughput limits too! And we did not distribute the partitions as well as in the main table:

Tenant’s products GSI

Whenever a single tenant started to define a lot of product prices at the same time, the main table worked well, but we faced write request throttling for GSI.

The main table without write throttled requests
The GSI with write throttled requests

Bear in mind that whereas read request throttling in GSI does not affect the main table, write requests throttling does as it fails the writes on the main table too!

Lessons learned

  • Try distributing all your partition keys evenly, including the main table and all its GSIs.
  • While designing a data model, always verify your GSIs for “hot” partition problems. Consider throughput limits for both reads
    (3000 RCU/s) and writes (1000 WCU/s).
  • If your key space is small, consider adding artificial suffixes to expand the possible set of values.
  • Insufficient write capacity in a GSI (throttled writes) causes failures on writes in the main table and any of its GSIs.
  • Insufficient read capacity in a GSI (throttled reads) does not affect the main table or other GSIs.

--

--

codeburst
codeburst

Published in codeburst

Bursts of code to power through your day. Web Development articles, tutorials, and news.

Tom Muc
Tom Muc

Written by Tom Muc

Software engineer at Ocado Technology, building distributed systems with passion

Responses (2)