Modify Data in your DynamoDB Table

Kathy Daniels
cloud native: the gathering
5 min readSep 5, 2020

You’re working on a long standing project when you realize a non-trivial schema change will deem all of your historical data useless or invalid. You wouldn’t want to delete all of the data and start collecting from scratch, so how can you alter your entire DynamoDB table without wasting too much time combing through documentation and devising an entirely new program to do this?

I encountered this issue when developing a large-scale project that was dependent on millions of time-specific data entries stored in DynamoDB. After using a standard date format (MM/dd/yyyy) for some time, I discovered that the ISO date format (yyyy-MM-dd) was better suited for flexible range queries (You can read more on this in my previous article), so I wanted my data to reflect that instead.

The project was written in Go, but even with the AWS Go SDK, I found it incredibly difficult to find straightforward answers explaining how to update specific fields in my tables. After doing some lengthy research, I was finally able to write a program that successfully modified millions of table entries so that I could continue using them in my application. Here’s how I did it.

Preface: Use a Global Secondary Index

The Query operation finds items based on specific primary key values whereas the Scan operation examines every single item. So rather than performing a Scan on your entire table, AWS suggests using Query for faster response times. Plus, the larger the table or index, the slower Scan performs. Therefore, creating global secondary indexes organized by primary keys was crucial for this program. Using indexes, you can efficiently query outdated items based on attributes of interest and modify their appropriate fields.

For my application, I created an index called weekday-date-index with the primary key specified as weekday and the sort key specified as date. This allowed me to query items under each individual weekday, isolate items with the incorrect date format, and initiate an update.

You can read more on scan and query best practices here.

1. Set up

I started by defining some constants to store the old format I was targeting, oldLayout, and the new format I wanted, newLayout. I also defined a schema for my table items so that I could easily inspect or edit any necessary attributes.

const oldLayout = "1/2/2006"
const newLayout = "2006-01-02"
type indexedItem struct {
UUID string `json:"uuid"`
Date string `json:"date"`
Metric string `json:"metric"`
Value float64 `json:"value"`
Weekday string `json:"weekday"`
}

Next was to establish a new DynamoDB session using the appropriate access key and secret key.

sess, _ := session.NewSession(&aws.Config{
Region: aws.String("us-west-2"),
Credentials: credentials.NewStaticCredentials(
dbAccess,
dbSecret,
"",
),
})
dbClient := dynamodb.New(sess)

Note: For this example we’re using static credentials, but the AWS SDK provides other ways as well. You can read more about it here.

2. Create a query

Since my approach was to target one weekday at a time using the primary key of my index, I created an array of weekday strings to loop through.

weekdays := []string{"Monday", "Tuesday", "Wednesday", "Thursday",     
"Friday", "Saturday", "Sunday"}
for _, weekday := range weekdays { var queryInput = &dynamodb.QueryInput{
TableName: aws.String("table-example"),
IndexName: aws.String("weekday-date-index"),
KeyConditions: map[string]*dynamodb.Condition{
"weekday": {
ComparisonOperator: aws.String("EQ"),
AttributeValueList: []*dynamodb.AttributeValue{
{
S: aws.String(weekday),
},
},
},
},
}
...

For the query input I wrote above, you can see that I specified a table name, index name, and a key condition on the weekday attribute. Key conditions provide the selection criteria for a query operation. For a query on a table, key conditions can only be on primary key attributes, a crucial detail that amplifies the significance of having an effective primary key in your index. Additionally, the key name and value may only be provided as an “EQ” condition, despite the availability of other options (EQ | LE | LT | GE | GT | BEGINS_WITH | BETWEEN).

We can now perform the query and store its response:

var resp, err = dbClient.Query(queryInput)
if err != nil {
// handle error case
}

3. Create an update

The query response should return as an array of items with attributes that match the criteria we specified as input. These items can be unmarshaled to Go types with the help of the dynamoattribute sub package for more straightforward access to each attribute.

for _, i := range resp.Items {   item := indexedItem{}
err = dynamodbattribute.UnmarshalMap(i, &item)
if err != nil {
// handle error case
}
...

This is where you can check for inconsistencies in the data and make any changes you need to make. In my case, I was looking for date entries under the format MM/dd/yyyy that I could reformat to ISO yyyy-MM-dd.

   // check if date is formatted incorrectly
if strings.Contains(item.Date, "/") {
incorrectDate, err := time.Parse(oldLayout, item.Date)
if err != nil {
// handle error case
}
// reformat time to be the correct format
correctDate := incorrectDate.Format(newLayout)
...

You should now have everything you need to create the input for your update.

   ...

input := &dynamodb.UpdateItemInput{
TableName: aws.String("table-example"),
Key: map[string]*dynamodb.AttributeValue{
"uuid": {
S: aws.String(item.UUID),
},
},
ExpressionAttributeNames: map[string]*string {
"#date": aws.String("date"),
},
ExpressionAttributeValues:
map[string]*dynamodb.AttributeValue{
":d": {
S: aws.String(correctDate),
},
},
ReturnValues: aws.String("UPDATED_NEW"),
UpdateExpression: aws.String("set #date=:d"),
}
...

Here’s a breakdown of the input fields I included:

  • TableName (required field): Your table name
  • Key (required field): The query response should return each item with its primary key (a UUID in my case), so now this information can be easily referenced to update each item
  • ExpressionAttributeNames: These are substitution tokens for attribute names to prevent conflicts with DynamoDB reserved words or to prevent special characters in an attribute name from being misinterpreted. The # character is actually used to dereference an attribute name, so specifying #date as the string "date" (the actual name of the field) allows us to use it in theUpdateExpression since DATE is a DynamoDB reserved word
  • ExpressionAttributeValues: A similar idea to ExpressionAttributeNames, except these are the actual substituted values. The : character is used to dereference an attribute value, so here I used :d to specify a string containing the correct format that I reference in the UpdateExpression
  • ReturnValues: A sanity check! Use this if you want to get item attributes before or after they are updated. I used the key string UPDATED_NEW to return only updated attributes as they appear after the update takes place
  • UpdateExpression: A string expression that should define which attributes to update, which action to perform, and what values to use. Here, I use set to replace the already existing date attribute with a new value

The last step is to actually update the item using your input.

   ...   _, err = dbClient.UpdateItem(input)    
if err != nil {
// handle error case
}

You should now have all the working parts to update your DynamoDB table!

Thanks for reading! This was my first time working with DynamoDB, but knowing that there are limited AWS developer guides for Go, I hope this helps others facing any similar issues.

If you’re interested in the source code, you can access it here!

--

--