Safe List updates with DynamoDB
Amazon DynamoDB is one of the most versatile and popular services on AWS. In seconds, we can deploy a highly available, dynamically scaling key-document store with global replication, transactions, and more! However, if we modify a list attribute on a document, we need to take extra steps to achieve correctness and concurrency. Below, I’ll describe the problem and offer several solutions.
Full code if you want to follow along.
In the DynamoDB console, here’s what the document looks like:
But this implementation has a race condition; there is a small window of time between reading the document, finding the index, and sending the update request, during which the document could be updated by another source, thus causing the operation “remove element at index X” to produce an undesired result. This problem is also referred to as Transactional Memory. Luckily, there are several solutions.
Condition Expression on the list contents
DynamoDB supports a handy feature called a Condition Expressions, which lets us specify a condition that must be met in order for the operation to execute. In this case we want to build a rule that says, “only execute this operation if the target value is in the list”:
We also need to handle the error case where the condition expression is not met. Here’s the updated function:
This technique only ensures that updates to the list attribute are safe. How can we ensure we only apply updates when the document has not changed?
Condition Expression on a version attribute
Borrowing from databases that employ Multiversion concurrency control, we can introduce a “version” attribute at the root of our document. We can use the version field to set a condition expression that aborts the update when any other update has occurred. During the put operation, we can include an initial version property like so:
Let’s update the condition expression and add error handling:
Notice that the Update Expression also increments the version attribute. The two drawbacks to this approach:
- We need to add a version attribute to every document/table for which we want to enforce this pattern.
- We need to create a wrapper layer that ensures all updates respect the version attribute and educate the team that direct update operations are prohibited.
Use the Set data type
In practice, a friends list would store of a list of unique foreign keys. If we know the entries are unique, we can marshal the friends field as the DynamoDB Set data type instead of a List. Compared to lists, sets have a few differences:
- All values must be of the same type (string, bool, number)
- All values must be unique
- To remove element(s) from a set, use the DELETE operation, specifying a set of values
- A Set cannot be empty
Note: the example in the docs uses a “DynamoDBSet” class, but this does not appear to be available as an import from the aws-sdk JS npm module. Instead, we’ll use the DynamoDB.createSet function, which accomplishes the same thing:
In the console, our new document looks almost identical, except for the “StringSet” type on the friends attribute.
Now to specify the DELETE operation:
Aha! A DynamoDB Set deserializes into an object with its array of items stored under the values property. If we want a Set to deserialize into an array, we’ll need to add an unmarshalling step where we assign the values property instead of the deserialized set object itself.
Second: remember how sets cannot be empty? If we try to remove all elements from a set, the console will stop us:
However, if we remove the last element from a set in code, the attribute will be deleted from the document. This means the unmarshalling step we mentioned in gotcha #1 will need to account for the case where the property is undefined. Here’s a helper function that covers both cases:
You’ll still get an error if you try to store an empty array as a set, so here’s the helper function going the other way:
Global Write Lock
Let’s not forget the time-honored tradition of preventing problems instead of solving them. Transactional Memory is only a problem when we allow concurrent writes. We can avoid concurrent writes by requiring any writer to obtain a distributed write lock (using a distributed lock service, such as etcd or zookeeper).
Since there are many implementations of the global-write-lock pattern, I’ll omit sample code and directly discuss the tradeoffs.
This technique has two significant drawbacks: 1) a distributed lock service adds extra complexity and latency. 2) A global write lock reduces write throughout. If you’re already using a distributed lock service and you don’t need high write throughput, this solution is worth considering.
What about Transactions?
DynamoDB recently added support for multi-document transactions, and this sounds like a promising solution. But, as my colleague Danilo puts it:
Items are not locked during a transaction. DynamoDB transactions provide serializable isolation. If an item is modified outside of a transaction while the transaction is in progress, the transaction is canceled and an exception is thrown with details about which item or items caused the exception.
For this use case, transactions will essentially act like a slower version of condition expressions.
That’s all, folks
Here’s a gist of all the code we’ve written so far. Do you have a better way to achieve safe list updates? Please share them in the comments. I hope this helps you make the most of DynamoDB and happy hacking!