Advanced Design Patterns for Amazon DynamoDB

Part one

National Australia Bank
6 min readFeb 14, 2019

DynamoDB is a managed ‘serverless’ NoSQL service from AWS.

Part One of this article provides a recap on some basic DynamoDB concepts, and then Part Two describes some best practice advanced design patterns for complex use cases, based on Amazon re:Invent 2018 sessions DAT404 and DAT401.

Why NoSQL?

NoSQL databases such as DynamoDB are optimized for performance at Internet scale, in terms of data size, and also in terms of query volume. They excel at scaling horizontally to provide high performance queries on extremely large datasets. This is done via a partitioning model, and requires that the data modelling is built with this in mind.

In order to meet traffic/sizing demands that are not suitable for relational databases, it is possible to re-engineer structures into NoSQL patterns, if time is taken to understand and enumerate the access patterns required.

SQL databases were designed when storage was expensive, and now that storage is cheap and compute is expensive, NoSQL is a better fit for systems optimized for maximum possible performance.

OLTP = Online Transaction Processing (usage patterns known in advance)

OLAP = Online Analytical Processing (ad-hoc unpredictable queries)

Why DynamoDB?

Use Cases

DynamoDB is recommended for the following example use cases:

  1. Operational datasets that are updated continually.
  2. Metadata of files stored in S3.
  3. Data storage that triggers other cloud native actions on creation.
  4. Storage of items that have inconsistent schemas.
  5. Persisting event stream data (eg. selective data from Kinesis).
  6. Maintaining aggregated / enriched data.
  7. Data requiring item or field level access control rules.

If designing a new application with known access patterns, it is possible to use NoSQL design patterns to meet most use cases. This requires careful planning to ensure that the required keys and index strategies are put in place. You should not assume that because you can sling JSON at DynamoDB, a best practice NoSQL data model is simpler that a relational data model.

If you have data to store that has unknown and/or adhoc query requirements, you should consider using a relational datastore.

Key concepts

This is a recap of the characteristics of the DynamoDB service. You will need to be familiar with these concepts in order to make the most of the subsequent advanced design patterns in Part Two.

The single most important aspect of designing a dynamodb data model is understanding and enumerating the access patterns of the application that will be interacting with the table — before designing the data model.

The DynamoDB mission is to always deliver single digit millisecond response times at any scale of table size or load volume. While the service is capable of this, it is up to engineers to design DynamoDB deployments that allow this to be realised.

Scalability

DynamoDB is a NoSQL database service that also backs many other Amazon services. As such, it is proven at Internet scale, and recommended as a managed serverless option for high volume use cases. In addition, the DAX product feature can be used to load data into memory.

https://aws.amazon.com/dynamodb/dax/ This further increases clustered performance.

Structure

In DynamoDB, there is no notion of a ‘database’, instead ‘tables’ are the deployable unit. It is recommended that only one table per app is used, making use of some of the strategies in this article to model data to suit the types of queries and other operations that will take place.

Partition keys and sort keys

Each ‘Item’ in a DynamoDB table is effectively a key value or document store, where the value is a json structure / document. The key consists of at least one field defined as a partition key (or multiple to form a composite), and an additional field as the sort key. These are primarily used to partition data across DDB replicas in order to improve query performance. Refer to this.

Secondary indexes

It is critical when designing your database table that you understand the types of queries that will be performed by your application, and the approximate frequency of each. This we help you to decide which secondary indexes you should create to optimise query performance.

DynamoDB allows you to query tables via:

  • The primary key (or composite)
  • A secondary index
  • A full table scan

DynamoDB supports two types of secondary indexes.

Global secondary indexes

An index with a partition key and a sort key that can be different from those on the base table. A global secondary index is considered “global” because queries on the index can span all of the data in the base table, across all partitions. Eventually consistent.

Local secondary indexes

An index that has the same partition key as the base table, but a different sort key. A local secondary index is “local” in the sense that every partition of a local secondary index is scoped to a base table partition that has the same partition key value.

Index count limits

Global secondary indexes are the most efficient way of supporting multiple query use cases on a single DDB table, however there is a maximum of 5 GSI’s per table.

This article will describe best practice approaches for working with this limit, and also ways of denormalizing a set of relational tables (eg. Oracle) into best practice noSQL structures.

Please also review other limits that apply to DynamoDB such as the 400KB item size.

Capacity planning

Each table is allocated Read Capacity Units (RCU) and Write Capacity Units (WCU). This is used to ensure that each table has the appropriate resources provisioned in order to cope with expected load profiles. Note that this is tunable, and also directly affects the cost of each provisioned DDB table. In addition, each Global Secondary Table also has RCU and WCU configuration.

If GSIs do not have enough write capacity, ALL table writes will be throttled.

Also, the new On-Demand capacity mode allows for adaptive read and write capacity on tables based on access activity analysis. Please read this.

Next

Now that you have a picture of the basics of DynamoDB, Part Two will explore some advanced data modeling patterns and new capabilities.

Here’s a link to Advanced Design Patterns for Amazon DynamoDB Part two.

If you’re interested in learning more or thinking about working in technology at NAB, click here.

About the author: Andrew Vaughan is a Senior Manager Distinguished Engineer/Arch at NAB. Prior to this, he’s held senior developer roles at CBA, AMP and MLC.

--

--

National Australia Bank
National Australia Bank

Written by National Australia Bank

NAB’s official account for all things tech and digital. Inspiring readers to rethink bank tech, its use of data and creating seamless digital experiences.

Responses (2)