Safe List updates with DynamoDB

Robert Zhu
May 21, 2019 · 5 min read

Amazon DynamoDB is one of the most versatile and popular services on AWS. In seconds, we can deploy a highly available, dynamically scaling key-document store with global replication, transactions, and more! However, if we modify a list attribute on a document, we need to take extra steps to achieve correctness and concurrency. Below, I’ll describe the problem and offer several solutions.

Are your list updates safe?

Full code if you want to follow along.

Let’s clarify the problem. Suppose we insert the following document, using the JavaScript AWS-SDK and the DynamoDB DocumentClient:

In the DynamoDB console, here’s what the document looks like:

By default, the DocumentClient has marshalled the JavaScript array as a DynamoDB List type. How would we remove the value “frylock” from the “friends” List attribute? Here’s the doc on the List’s remove operation. Since we need to specify the index of the element to remove, we need to read the document and find the index:

But this implementation has a race condition; there is a small window of time between reading the document, finding the index, and sending the update request, during which the document could be updated by another source, thus causing the operation “remove element at index X” to produce an undesired result. This problem is also referred to as Transactional Memory. Luckily, there are several solutions.

Condition Expression on the list contents

A Condition Expression is a predicate that prevents execution if it evaluates to false

We also need to handle the error case where the condition expression is not met. Here’s the updated function:

This technique only ensures that updates to the list attribute are safe. How can we ensure we only apply updates when the document has not changed?

Condition Expression on a version attribute

Let’s update the condition expression and add error handling:

Notice that the Update Expression also increments the version attribute. The two drawbacks to this approach:

  1. We need to add a version attribute to every document/table for which we want to enforce this pattern.
  2. We need to create a wrapper layer that ensures all updates respect the version attribute and educate the team that direct update operations are prohibited.

Use the Set data type

  1. All values must be of the same type (string, bool, number)
  2. All values must be unique
  3. To remove element(s) from a set, use the DELETE operation, specifying a set of values
  4. A Set cannot be empty

Sounds perfect for storing a list of related document keys. However, we saw that the DocumentClient serializes JavaScript arrays as Lists, so we need to override that behavior with a custom marshaller.

Note: the example in the docs uses a “DynamoDBSet” class, but this does not appear to be available as an import from the aws-sdk JS npm module. Instead, we’ll use the DynamoDB.createSet function, which accomplishes the same thing:

In the console, our new document looks almost identical, except for the “StringSet” type on the friends attribute.

Now to specify the DELETE operation:

Working with Sets from JavaScript has two gotchas. First: a set attribute on a document does not deserialize into a JavaScript array. Let’s see what it actually returns:

Aha! A DynamoDB Set deserializes into an object with its array of items stored under the values property. If we want a Set to deserialize into an array, we’ll need to add an unmarshalling step where we assign the values property instead of the deserialized set object itself.

Second: remember how sets cannot be empty? If we try to remove all elements from a set, the console will stop us:

The console prevents us from deleting the last element from an existing set, but the SDK does not.

However, if we remove the last element from a set in code, the attribute will be deleted from the document. This means the unmarshalling step we mentioned in gotcha #1 will need to account for the case where the property is undefined. Here’s a helper function that covers both cases:

You’ll still get an error if you try to store an empty array as a set, so here’s the helper function going the other way:

Global Write Lock

Since there are many implementations of the global-write-lock pattern, I’ll omit sample code and directly discuss the tradeoffs.

This technique has two significant drawbacks: 1) a distributed lock service adds extra complexity and latency. 2) A global write lock reduces write throughout. If you’re already using a distributed lock service and you don’t need high write throughput, this solution is worth considering.

What about Transactions?

Items are not locked during a transaction. DynamoDB transactions provide serializable isolation. If an item is modified outside of a transaction while the transaction is in progress, the transaction is canceled and an exception is thrown with details about which item or items caused the exception.

For this use case, transactions will essentially act like a slower version of condition expressions.

That’s all, folks

HackerNoon.com

#BlackLivesMatter