AWS DynamoDB In Action

10 min readJan 21, 2023

In this practical article I’ll explain the core concepts and fundamentals of AWS DynamoDB, Different types of queries with their complexity, And we will bring all these into action with an interesting case study.

Topics which being explained in this article are essential for everybody works or going to start working with the DynamoDB.

DynamoDB is one of the best choices for the serverless applications but you should really be careful of how you design your database and how to use different types of queries, Otherwise using DynamoDB could be expensive for you. In this article you’ll get the basics of how to avoid bad designs for the DynamoDB.

As a case study we will design a table which keeps the historical data of the football players based on a team they played in over a time window.

Even Ronaldo is happy with this case study 😜

When we talk about working with any kind database it’s important to think about the best possible design in our use cases, To do so we need to first know the key components of our database very well, So are we going to first start with the most important key components of the DynamoDB.

Primary key

Each item in the table defined uniquely using a primary key. Primary key is used to partition and retrieve data from the table. Each table in DynamoDB requires a primary key, and there are two types of primary keys:

Partition key: Also known as a hash key, this is a single attribute that is used as the unique identifier for an item. No two items can have the same partition key value.
Partition key and sort key: Also known as a composite primary key or a hash-and-range key, this consists of two attributes. The first attribute is the partition key, and the second attribute is the sort key. The combination of these two should be unique.

Keep in mind that the DynamoDb uses the sort key to sort your items in the table, When you query the data you can define whether the result being returned Desc or Asc based on your sort key.

Types of the Partition key / Sort key can be “String”, “Number” or “Binary”.

It is important to choose the primary key carefully as it will determine the data distribution and access patterns in the table. It also affects the performance, scalability and cost of your table.

As an example we have a table called “PlayersHistory” which keeps the historical data of the football players. The partition key is the username of the player and the sort key is the date which the player started to play for a certain team. Basically we are making each item unique based on the date a certain player joined a team.

In this table we use the partition key and sort key strategy to define our primary key. Let’s put some data in it.

As you can see we have two items for cristiano since he had played for two different teams over the time and for messi only one item. The combination for the Username(partition key) and the StartDate(sort key) is our primary key, Our primary keys for cristiano are “cristiano2003–08–12”(The date he joined the Manchester United) & “cristiano2009–07–01”(The date he joined Real Madrid) and the primary key for messi is “messi2000–09–01”(The date he join Barcelona).

Secondary Index

A secondary index allows you to query a table using non-primary key attributes. Secondary indices provide additional querying flexibility and can improve the performance of certain types of queries.

There are two types of secondary indices in DynamoDB:

Local secondary index: An index that has the same partition key as the table, but a different sort key. It allows for querying the table using both the partition key and the sort key, but only within the same partition.(Note that you can only create local secondary indices while you create the table, after the table creation it’s not possible to create any local secondary index)
Global secondary index: An index that has a partition key and a sort key that can be different from the table’s primary key. It allows for querying the table using both the partition key and the sort key, across all partitions.

When you create a secondary index, you specify the attributes to be used as the index key, and the index key schema must be different from the primary key schema.

Each secondary index also has its own read and write capacity units, which can be provisioned separately from the table’s capacity units. This gives you the power to define your capacity units based on the amount queries you expect for each index.

Keep in mind that secondary indices have some trade-offs such as increased storage costs, increased write costs and limits in terms of provisioned throughput.

As an example we will create a Global secondary index for the “Team” field. This time we will use the partition key strategy. Later we will use this index to write efficient queries based on our Teams.

Operations (Queries)

I would like to first of categorize different operations performance wise:

BEST: Get-item, update-item, delete-item (require primary key)
These are the bests since you should provide the primary key to the operation which means the most efficient operation you can ever write with the least complexity.
GOOD: Query (requires partition key or global secondary index partition key, for sort key supports expressions and also supports attributes filtering)
BAD: Scan (Try not to use it within your application, How? You will learn in this article how to do it with a correct design)

As an example let’s try each type of read queries (Get-item, Query, Scan)on our PlayersHistory table.

Get-item

This operation only returns one result if any item exists with the specified primary key, in our case the primary key would be a combination of the partition key and the sort key, run the below command on terminal based on “cristiano” as the partition key and “2003–08–12” as the sort key to return his data when he was playing for the Manchester United.(If you don’t have your aws credentials and config configured on your local machine you can use the AWS CloudShell for the simplicity)
Note that there is no support for the Get-item query in the DynamoDb section in the AWS console.

aws dynamodb get-item --table-name PlayersHistory \
--key '{"Username": {"S": "cristiano"}, "StartDate": {"S": "2003-08-12"}}'

The output should be like below:

{
    "Item": {
        "Team": {
            "S": "Manchester United"
        },
        "EndDate": {
            "S": "2009-07-01"
        },
        "Goals": {
            "N": "118"
        },
        "StartDate": {
            "S": "2003-08-12"
        },
        "Username": {
            "S": "cristiano"
        }
    }
}

Query

The output of the query is a an array of items returned based on the partition key, here the cool stuffs starts with the query since if you have the primary key with the combination of the partition key & the sort key, The only mandatory field is the partition key, You can either specify the sort key or not.
Let first have an example without specifying the sort key:

aws dynamodb query \
--table-name PlayersHistory \
--key-condition-expression "Username = :pk_value" \
--expression-attribute-values '{":pk_value":{"S":"cristiano"}}'

Here in the key condition you should specify your query’s condition which is the partition key which is “Username” in our case to be equal to a defined variable, in our case I called it “pk_value” (the variable name can be anything, For example instead of “pk_value” you can call it “the_goat”), In the expression attribute values you should specify your variable name again and also define the type and the value for it.

The output of the above query should be like below:

{
    "Items": [
        {
            "Team": {
                "S": "Manchester United"
            },
            "EndDate": {
                "S": "2009-07-01"
            },
            "Goals": {
                "N": "118"
            },
            "StartDate": {
                "S": "2003-08-12"
            },
            "Username": {
                "S": "cristiano"
            }
        },
        {
            "Team": {
                "S": "Real Madrid"
            },
            "EndDate": {
                "S": "2018-07-10"
            },
            "Goals": {
                "N": "451"
            },
            "StartDate": {
                "S": "2009-07-01"
            },
            "Username": {
                "S": "cristiano"
            }
        }
    ],
    "Count": 2,
    "ScannedCount": 2,
    "ConsumedCapacity": null
}

As you can see it returned an array of teams which cristiano played in, It also returned the count of items which matched with our key condition.

You can also run the same query using the AWS console.

Now let’s say we want to query cristiano’s data for the teams he joined after 2006, Now we should get the help of our sort key, so the query looks like below:

aws dynamodb query \
--table-name PlayersHistory \
--key-condition-expression "Username = :pk_value and StartDate > :sk_value" \
--expression-attribute-values \
'{":pk_value":{"S":"cristiano"},":sk_value":{"S":"2006-12-31"}}'

Now the output should only be the Real Madrid.

{
    "Items": [
        {
            "Team": {
                "S": "Real Madrid"
            },
            "EndDate": {
                "S": "2018-07-10"
            },
            "Goals": {
                "N": "451"
            },
            "StartDate": {
                "S": "2009-07-01"
            },
            "Username": {
                "S": "cristiano"
            }
        }
    ],
    "Count": 1,
    "ScannedCount": 1,
    "ConsumedCapacity": null
}

We can also run the same query in AWS console.

Query using the partition key and the sort key with Greater than operator

One more thing it worths to address here with the “Query”; Do you remember we created a Global Secondary Index on our “Team” field? Now we want to use that one.
Let’s consider a scenario that you want to get all the players played for a certain team, So how will you write the query for it?

One way is to use the “Scan” which you should really avoid in your production applications since as we said before they are the least efficient operation you can run on DynamoDb.
The other way is to run a “Query” but for running the query you need the Username(partition key) which we don’t have any specific username in our case scenario.

In these situations the Global Secondary Index will be our hero since it enables us to run the “Query” on our “Team” field. let’s see how the “Query” on a Global Secondary Index looks like.
Before running the query let’s first add another player who played for Real Madrid to make our output more sensible because now for each team there is only one player in our database.

Let’s add ramos who also played for Real Madrid.

Now let’s run the query on our Global Secondary Index called “Team-index”.

aws dynamodb query \
--table-name PlayersHistory \
--index-name Team-index \
--key-condition-expression "Team = :pk_value" \
--expression-attribute-values '{":pk_value":{"S":"Real Madrid"}}'

The output is an array of players, played for the Real Madrid.

{
    "Items": [
        {
            "Team": {
                "S": "Real Madrid"
            },
            "EndDate": {
                "S": "2021-07-01"
            },
            "StartDate": {
                "S": "2005-07-01"
            },
            "Goals": {
                "N": "101"
            },
            "Username": {
                "S": "ramos"
            }
        },
        {
            "Team": {
                "S": "Real Madrid"
            },
            "EndDate": {
                "S": "2018-07-10"
            },
            "StartDate": {
                "S": "2009-07-01"
            },
            "Goals": {
                "N": "451"
            },
            "Username": {
                "S": "cristiano"
            }
        }
    ],
    "Count": 2,
    "ScannedCount": 2,
    "ConsumedCapacity": null
}

It’s also possible to run the same query on AWS console.

Scan

As mentioned before you should really avoid using “Scan” on your production applications.
In the “Scan” you can define filter expressions to filter the data based on your demand.
The output of “Scan” is an array of items filtered based on your filter expression.
Let’s consider the previous scenario to get all the players, played for the Real Madrid but this time with an inefficient operation called “Scan”. 😮‍💨

aws dynamodb scan \
--table-name PlayersHistory \
--filter-expression "Team = :team_value" \
--expression-attribute-values '{":team_value":{"S":"Real Madrid"}}'

The output is like the “Query” output.

{
    "Items": [
        {
            "Team": {
                "S": "Real Madrid"
            },
            "EndDate": {
                "S": "2018-07-10"
            },
            "Goals": {
                "N": "451"
            },
            "StartDate": {
                "S": "2009-07-01"
            },
            "Username": {
                "S": "cristiano"
            }
        },
        {
            "Team": {
                "S": "Real Madrid"
            },
            "EndDate": {
                "S": "2021-07-01"
            },
            "Goals": {
                "N": "101"
            },
            "StartDate": {
                "S": "2005-07-01"
            },
            "Username": {
                "S": "ramos"
            }
        }
    ],
    "Count": 2,
    "ScannedCount": 4,
    "ConsumedCapacity": null
}

Now you know the basics and fundamentals of DynamoDb and you are good to go with designing your database and start enjoying the power of dynamoDB :)))

I have more articles about the serverless world and I also write articles and stories about Life, Coding, IT Lessons and Technology, If you’d care to read more, follow me on Medium :)