Using DynamoDB transactions to model relational data

Published in

Signiant Engineering

5 min readFeb 25, 2019

At Signiant we use AWS’s DynamoDB extensively for storing our data. We like it because it provides scalability and performance while being almost completely hands-off from an operational perspective.

AWS recently announced support for transactions in DynamoDB. Transactions guarantee that a set of database operations are performed as an atomic unit, and are ACID compliant. In the case of an error during any part of the transaction, all operations will be rolled back, helping to guarantee data integrity and consistency across many rows or tables.

This is particularly useful for modelling relational data, where relationships between entities are written across multiple records and all need to be present to maintain integrity.

We’re going to run through an example of how one might leverage transactions when writing relational data in DynamoDB.

Sample Relational Data Scenario

Let’s take a simplified example of flights between airports. In this scenario we have three related entities:

Airports
The routes or connections between airports
Airline companies that operate routes

When modelling relational data in a NoSQL database like DynamoDB, it is important to think about how you will want to access the data before you start. In our case, we want to organize the data in a way to answer the following questions:

What routes does a given company operate?
What routes enter or leave airport A, or connect airport A and airport B?

Using techniques described in the Best Practices for Modeling Relational Data with DynamoDB, we will start with the following rows in our table:

We can then use standard and efficient DynamoDB queries to retrieve the data for the access patterns we defined.

What routes does Company-A operate?

Use the Global Secondary Index partition key (GSI-PK) to query on Company-A. Extract the route from the PK attribute to get “Route-1” and “Route-2”.

What routes connect LAX and JFK?

Use the primary key to query where PK=Airport-LAX, SK BEGINS_WITH Airport-JFK. Extract the route from the SK attribute to get “Route-1” .

Adding a new route into the system

Let’s say that “Undercutter Air” is introducing a new flight direct from LAX to LFH.

To add this route into our system we need to insert three new rows to our table:

Historically, your best option to add these three rows as an atomic unit was to perform a batch write. Using the AWS JavaScript SDK, the code would look something like this:

const params = {
    RequestItems: {
        'Table-1': [{
            PutRequest: {
                Item: {
                    PK: 'Route-3',
                    SK: 'Company-B',
                    From: 'Airport-LAX',
                    To: 'Airport-LHR',
                }
            }
        }, {
            PutRequest: {
                Item: {
                    PK: 'Airport-LAX',
                    SK: 'Airport-LHR|Route-3',
                }
            }
        }, {
            PutRequest: {
                Item: {
                    PK: 'Airport-LHR',
                    SK: 'Airport-LAX|Route-3',
                }
            }
        }]
    }
};

const documentClient = new AWS.DynamoDB.DocumentClient();

documentClient.batchWrite(params, function (err, data) {
    if (err) console.log(err);
    else console.log(data);
});

While this would work a vast majority of the time, there are cases where a portion of the batch write could fail, leaving your data in an inconsistent state. You can, and should, follow the recommendations of retrying any unprocessed keys until they succeed. But for how long do you retry? What happens if your system shuts down while you are retrying? What happens if somebody tries to query the data when only a portion of the keys are written? There are no simple solutions to these problems.

Using DynamoDB Transactions

To write the same data as an atomic transaction, we can instead use the TransactWrite call in the JavaScript SDK. The code would look something like this:

const params = {
    TransactItems: [{
        Put: {
            TableName : 'Table-1',
            Item: {
                PK: 'Route-3',
                SK: 'Company-B',
                From: 'Airport-LAX',
                To: 'Airport-LHR',
            }
        }
    }, {
        Put: {
            TableName : 'Table-1',
            Item: {
                PK: 'Airport-LAX',
                SK: 'Airport-LHR|Route-3',
            }
        }
    }, {
        Put: {
            TableName : 'Table-1',
            Item: {
                PK: 'Airport-LHR',
                SK: 'Airport-LAX|Route-3',
            }
        }
    }]
};

const documentClient = new AWS.DynamoDB.DocumentClient();

documentClient.transactWrite(params, function(err, data) {
    if (err) console.log(err);
    else console.log(data);
});

That’s it! It will succeed or fail as an atomic unit. No need for the added complexity of retries, or to worry about partial failures causing data integrity issues.

You can also perform transactions across tables, or perform deletes and updates in the same transaction, if your relational model requires it.

Update: Note as of June 2019 there is now an increased limit of 25 operations per transaction, so if you need to update more than 25 rows you may need to revert to batch writing or perform multiple transactions.

References

You can learn more about advanced design patterns in DynamoDB by watching this great presentation from Re:Invent 2018.

Learn more about modelling relational data from AWS best practices:

Best Practices for Modeling Relational Data in DynamoDB - Amazon DynamoDB

Factors that prevent the relational model from scaling well in DynamoDB.

docs.aws.amazon.com

Best Practices for Managing Many-to-Many Relationships - Amazon DynamoDB

Best practices for using graph concepts for effective database design.