DynamoDB: Efficient Indexes

This is the post #5 of the series aimed at exploring DynamoDB in detail. If you haven’t read the earlier posts, you can find them here.

This post will aim to present some use cases for DynamoDB secondary indexes and some key considerations to creating and using indexes efficiently. If you are not interested in reading through the entire blog and want to jump to the summary straightaway, click here.

Why Indexes:

Consider the example of “Landmarks” table shown below.
DynamoDB: Sample Table for Illustration

The data model for this table was designed considering the access pattern mentioned in the above figure. Assume, you are expanding the functionality of your service and now need to support the below access pattern.

Given the name of a city, get all the landmarks listed in the city

To support this access pattern, the attribute “City” should preferably be the Partition Key in your data model. Unfortunately, if you do that, that will break the existing access pattern.

What can you do, to support both access patterns? You can adopt a couple of approaches.

  • Create another table, say “Landmarks_1” with the attribute “City” as the Partition Key. With this option:
  • You need to backfill the new table with the existing data and also
  • Make sure that the “Landmarks_1” table is in-sync with the “Landmarks” table (ie.,) any updates/deletes to the “Landmarks” table has to be replicated to the “Landmarks_1” table and vice versa

(or)

  • You can use Indexes. You can:
  • Create a GSI (Global Secondary Index) named “Landmarks_1” and set “City” as the Partition Key
  • Use the below query on your newly created index, for example, to support your second use case
aws dynamodb query --table-name "Landmarks" --index-name "Landmarks_1" --key-condition-expression "City=London"

DynamoDB takes care of backfilling the GSI and also ensures that GSIs are updated as and when there are changes to the main table. So, this relieves the user from the burden of maintaining a secondary table.

DynamoDB: Global Secondary Index
Indexes can help extend support to access patterns that are not supported by the table data model

When to avoid indexes?

As mentioned in the above section, any updates to the main table need to be propagated to the indexes in order to keep them relevant. While DynamoDB takes care of this in the background, this is not free of cost. Every update to the index is counted against the provisioned WCU. While GSIs have to be provisioned their own RCU/WCU, separate from the table’s provisioned RCU/WCU, LSIs (Local Secondary Indexes, discussed later) share the table’s provisioned RCU/WCU. Either way, updates to indexes cost money and when the table in question has a write-heavy workload, the indexes need to be updated more frequently and hence cost more money.

Avoid indexes on tables with write-heavy workloads

Guidelines to using DynamoDB Indexes:

Keep Indexes Small:

Apart from the obvious cost of reads and writes, Indexes also carry a storage cost. The larger an index the more it costs to store it in DynamoDB. So, it helps to keep indexes small. There are a couple of ways to achieve this.

  • Use Sparse Indexes
  • Project only necessary attributes into the index

Sparse Indexes:

DynamoDB will only create entries in an index if the corresponding item actually has values for the hash and range attribute. This knowledge can be used to create a table that flags, for example, all hotels that are no longer available for booking, in a given city. Refer example below.

DynamoDB: Sparse Indexes

As can be seen from the above example, there can be thousands of items in your table but if there is only one item in the table that has values for both the hash and sort keys of the index, then there will only be one item in the index. This can possibly save a lot of money that would otherwise be spent on storage (for no additional benefits)

Project only necessary attributes:

Another way to keep index size small is to not project all the attributes in the main table, into the index. Say, for example, you have a table like the one shown below.

DynamoDB: Example Customer Table

Assume, the table serves the below access pattern:

aws dynamodb get-item --table-name Customer --key file://key.json
//key.json
{
"Customer_ID": {"S": "12555-1234"}
}

and now you need to extend your support to the below access pattern:

aws dynamodb query --table-name Customer --index-name Customer_idx --attributes-to-get '["Postal_Code", "Customer_ID", "First_Name", "Last_Name"]'--key file://key.json
//key.json
{
"Postal_code": {"N": "10021"}
}

You can see from the above query that you need to retrieve only the following attributes:

  • Postal_Code
  • Customer_ID
  • First_Name
  • Last_Name

You don’t need the “Other_Details” attribute to support this access pattern. In this case, avoid projecting this attribute into your index. So, you can create an index like the one shown below.

aws dynamodb query --table-name Customer --index-name Customer_idx --attributes-to-get '["Postal_Code", "Customer_ID", "First_Name", "Last_Name"]'--key file://key.json

This reduces the footprint of your index and you pay less for storage as well as access (reads/writes).

Prefer Indexes to Filters:

Filter expressions help narrow down the result set of Query and Scan operations. But, these are executed post the execution of Query and Scan operations. That means, to get a result set with two items (for example), you might have to navigate all the items in the entire table (in the case of Scan) or most of the items belonging to a single partition key (in the case of Query). This is a very expensive operation that has both cost and performance implications.

If you are using Filters to support any use case, consider using indexes. As an example, consider the table and the access patterns shown below.

DynamoDB: Scan Operation with Filter Expressions
Avoid Filters. Replace them with Indexes, wherever possible.

In order to support access pattern 2, a Scan operation with filter expressions are being used here. This is highly inefficient, because the entire table needs to be scanned, every time you run this operation. A better way of doing this will be via indexes, as shown below.

DynamoDB: GSI vs Scans

With Indexes, the Query operation is expected to take a fraction of the total time (and cost) consumed by Scan. This is a typical use case for Indexes.

Global Secondary Index or Local Secondary Index:

DynamoDB offers two types of Indexes:

  • GSI (Global Secondary Index)
  • LSI (Local Secondary Index)

Key differences between the two are shown in the below table.

DynamoDB: GSI vs LSI (Credit: AWS DynamoDB Documentation)

Unless you have a need for strongly consistent reads, always choose GSIs. They are more flexible because, you can create and delete them whenever you want and have no size restrictions, unlike LSIs.

Unless you have a need for strongly consistent reads, always choose GSIs.

Please note, LSIs offer the ability to request attributes that are not projected into the index. While this may sound like a good idea to save on some storage, this carries a considerable cost. Reads on non-projected attributes lead to an additional request to the base table, increasing the latency as well as read costs.

If a majority of your reads request non-projected attributes, it is worth projecting them into the index (LSI only).

The choice of whether you should project all attributes into the index or read non-projected attributes from the base table, depends on your workload. If a majority of your reads request non-projected attributes, it is worth projecting them into the index. Even though this increases the storage cost, this will save you a lot of additional RCUs utilized otherwise. If you are rarely requesting non-projected attributes and you’re fine with the additional latency for those few rare requests, then it makes better sense to not project them on to the index and save on storage.

To Summarize:

  • Indexes can help extend support to access patterns that are not supported by the table data model
  • Avoid indexes on tables with write-heavy workloads
  • Keep indexes small
  • Use sparse indexes wherever possible
  • Project only attributes that are necessary
  • Avoid Filters. Replace them with Indexes, wherever possible
  • Unless you have a need for strongly consistent reads, always choose GSIs
  • If a majority of your reads request non-projected attributes, it is worth projecting them into the index (LSI only), even though they add additional storage costs

I hope this blog gave you a reasonable insight into designing and using indexes efficiently, on DynamoDB tables.