Querying DynamoDB by Date Range

Efficiently store and query chronological data using ISO format and global secondary indexes

Kathy Daniels
Sep 12 · 5 min read

Data is chronologically ordered practically anywhere and everywhere you look. Amazon allows you to search your order history by month. Spotify sorts your playlists by date. Gmail organizes your emails by time. But when it comes to building your own time-driven applications, it can be more confusing than you think.

I’ve recently been working on a Go web app that heavily relies on evaluating metrics stored in DynamoDB over the course of hours, days, months, and years. If you’ve ever worked with the AWS Go SDK, you probably know how difficult it can be to navigate the documentation. So after hours of scouring the internet for answers, I’m here to shed some light on how you can efficiently query DynamoDB by any given time range (in Go!).

Store dates in ISO format

While Go offers a variety of standard date and time formats, only one will help eliminate ambiguity and achieve custom levels of granularity needed for your application.

If you’re a Spotify user, this probably looks familiar

ISO 8601 is an international standard for representing dates and times. The general-to-specific model that separates it from other standard time layouts makes it extremely easy to specify any time range for queries.

  • Want data from 2019? Just look for strings starting with 2019
  • Want data from Feb 2019? Just look for strings starting with 2019–02
  • Want data from Feb 1, 2019? Just look for strings with 2019–02–01

You can continue extending granularity even further by adding timestamps after each date!

Identify key time intervals

Assuming you’ve already created a data table for your application, it’s important to recognize which time intervals you‘d want to organize your data by. An example you might consider for a clothing business:

  • How many shirts do we sell on Tuesdays?
  • Which styles are more popular than others?
  • How should inventory change based on each month of the year?

If you were interested in these metrics, your table would probably have attributes like weekdays, dates, shirt style, and number of shirts sold.

Note that the more fields you have, the more ways you’ll be able to organize your data and query your tables based on varying attributes.

Create global secondary indexes

Your data table now contains the metrics you’re interested in with all necessary time metadata. Great! But… you might not always care about every stored attribute. For instance, if you wanted to examine the total number of T-shirts sold in August, you wouldn’t need to know each individual weekday. This is where global secondary indexes come in handy.

A global secondary index (GSI) can have a partition key or sort key different from the one in your base table. It allows you to refine your queries by only searching for primary key attribute values that you specify. And for this use case, it makes date range queries simple if you sort your index by time.

With the clothing business example, I might create the following GSI:

This partitions the base table by each unique shirt style and sorts that data by date. Notice that I’m only interested in projecting the number of shirts sold and not the weekdays. I could then use this to evaluate the sales of Unisex T-shirts in August!

Some extra guidelines to keep in mind:

  • Keep the size of your index as small as possible to improve performance. This means choosing your projections carefully!
  • Limit your indexes to attributes you know you will query often. This will help you avoid wasting storage space and I/O costs.

Query your global secondary index

So you’re finally ready to make some data queries for your application. And thanks to your index sort key, it’s easy to specify any time range.

Here’s what you need to know:

  • You must specify the name of your base table and index
  • You must specify the primary key attribute value you’re looking for
  • You can optionally use a comparison operator to refine your search on a sort key attribute
  • Queries support EQ | LE | LT | GE | GT | BEGINS_WITH | BETWEEN

To demonstrate the process, I’ll refer to my previous example. Start by establishing a DynamoDB session using your access key and secret key:

sess, _ := session.NewSession(&aws.Config{
Region: aws.String("us-west-2"),
Credentials: credentials.NewStaticCredentials(
dbAccess,
dbSecret,
"",
),
})
dbClient := dynamodb.New(sess)

Construct your query input with the desired parameters. Since we want to see how many Unisex T-shirts were sold in August of 2019, we would specify the shirt style as “ Unisex T-shirt ” and look for dates that start with “ 2019–08 ”.

var queryInput = &dynamodb.QueryInput{
TableName: aws.String("shirts-sold"),
IndexName: aws.String("shirt_style-date-index"),
KeyConditions: map[string]*dynamodb.Condition{
"shirt_style": {
ComparisonOperator: aws.String("EQ"),
AttributeValueList: []*dynamodb.AttributeValue {
{
S: aws.String("Unisex T-shirt"),
},
},
},
"date": {
ComparisonOperator: aws.String("BEGINS_WITH"),
AttributeValueList: []*dynamodb.AttributeValue {
{
S: aws.String("2019-08"),
},
},
},
},
}

Alternatively, if we wanted to see how many Unisex T-shirts were sold in the last week of August, we could use the BETWEEN comparison operator.

"date": {
ComparisonOperator: aws.String("BETWEEN"),
AttributeValueList: []*dynamodb.AttributeValue{
{
S: aws.String("2019-08-25"),
},
{
S: aws.String("2019-08-31"),
},
},
},

Use this input to query DynamoDB:

var resp, err = dbClient.Query(queryInput)
if err != nil {
// handle your error case
}

The response should come back looking something like this:

{
Count: 7,
Items: [
{
shirt_style: {
S: "Unisex T-shirt"
},
shirts_sold {
N: 46
},
uuid: {
S: "9"
},
date: {
S: "2019-08-25"
}
},
... ],
ScannedCount: 7
}

Lastly, we can loop through each item, unmarshal the data, and add up the number of shirts sold:

type shirtSales struct {
Style string `json:"shirt_style"`
AmountSold int `json:"shirts_sold"`
UUID string `json:"uuid"`
Date string `json:"date"`
}
...totalAmountSold := 0
for _, shirtItem := range resp {

dailySales := shirtSales{}
err := dynamodbattribute.UnmarshalMap(shirtItem, &dailySales)
if err != nil {
return nil, errors.Wrapf(err, "Error unmarshaling resp item.")
}
totalAmountSold += dailySales.AmountSold

}

This process can be used to gauge the sales of any shirt style from any time frame. But more importantly, you can now apply this to your own application to search for time-specific data!

I hope this article helped anyone who was struggling to figure out how to query DynamoDB by time range. If you have any other tips to add, please feel free to share your thoughts in the comments!

cloud native: the gathering

A gathering place for all things cloud native, microservices, api development

Thanks to Tim Jones

Kathy Daniels

Written by

Computer Science and Engineering | UCLA 2021

cloud native: the gathering

A gathering place for all things cloud native, microservices, api development

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade