Break The Ice With DynamoDB

An overview discussion with some fundamental hands-on tasks

Published in

Brain Station 23

13 min readDec 17, 2019

DynamoDB is a scalable and faster performing Non-relational database service that is completely hosted by Amazon Web Service. On it’s best features, it takes care of cluster scaling, hardware provisioning, setup & configuration, software patching and replication along with encryption of data at rest as built-in security that reduces the complexity of sensitive data protection. By this, the user gets rid of some complex administrative roles for operating and maintaining the database.

With DynamoDB, the user can create tables, store, retrieve, serve desired numbers of data and request traffic. DynamoDB is a NoSQL database. Thus, the user doesn’t have to define any prior schema for the data attributes and their data types rather than the primary key attribute while creating the table. The data can be stored and retrieved like the JSON document.

The service is also available with the on-demand back up feature to archive data for long periods of time and point-in-time recovery of the backed up data. Any kind of accidental operation on a table can be recovered or restored at any time within 35 days. Let’s have a little journey by scrolling with DynamoDB.

Table of Content

Components
Supported Data Types
Primary Key
Concept of Indexing
Consistency Model
Pricing
Application & Use Cases
Facts To Remember
Hands-On Stuff
Create Table
Writing Data
Updating Data
Reading Data
Deleting Data
End Note

Components

There are 3 key components of DynamoDB such as Tables, Items & Attributes.

Tables — It’s like the other database tables which contain all the data row-wise.

Items — Each row in the table represents an Item that containing all the attribute data.

Attributes — Columns in each Item containing various data of DynamoDB supported data types.

Supported Data Types

Scalar Types — Number, String, Binary, Boolean, and Null.
Document Types — Complex structure with nested attributes like in JSON data format. Usually list and map.
Set Types — Contains more than one and different scalar types

Primary Key

DynamoDB represents its primary key with the combination of a Partition key and an optional Sort key.

Partition key — It’s an internal hash to point physical storage in the database where the data is stored. Two Items can’t hold identical partition key If the primary key is formed with only the partition key.
Sort Key — Sort key is optional to form the primary key but adds some additional advantages in need. The table containing partition key and sort key can have more than one item holding the same partition key. In that case, the sort key must be non-identical. Partition key helps to stores all the items within some particular shards and sort key helps to keep those data inside those shards in sorted order. By this, the performance of query operations becomes very efficient. Also, sort key is useful to perform range queries with operators like =, <, >, <=, >=, between and begins_with.

A primary key can hold only scalar values. Supported data types for a primary key are string, number, and binary. Other non-key attributes are free from this restriction.

Concept of Indexing

Indexing is a way of optimizing database query performance reducing the amount of disk access. It’s a very useful technique that helps to detect and retrieve data faster. DynamoDB uses two types of indexing mechanism named as secondary indexes —

Local Secondary Index — The partition key of the index is identical to the partition key of the primary key but the sort key is different.

*Source:* *AWS DynamoDB* Local Secondary Indexes

Global Secondary Index − The partition key and sort key that can be different from the table’s own partition key and sort key aka primary key.

Source: AWS DynamoDB Global Secondary Index

DynamoDB supports up to 5 Local Secondary Indexes and 20 Global Secondary Indexes per table.

Consistency Model

Eventually Consistent Read — With eventual consistent read, the returned data might not represent the latest data after a certain update operation. It’s useful to the scenarios where read throughput has more priority than having the most recent data immediately after any write.
Strongly Consistent Read — Returns read result when the data is ensured to be updated. With this, the latency increases but the advantage is that it returns the response with the most up to date data.

By default, DynamoDB works with eventually consistent reads but can be set as strongly consistent read while performing a read operation by setting up the ConsistentRead parameter to True.

Pricing

The usual charges come with the data read, write and stored in tables. Additional charges arise when a user chooses to use some other optional features offered by DynamoDB. It has two types of capacity modes associated with pricing —

On-Demand Capacity Mode — The user doesn’t need to set up any read and write throughput DynamoDB only charges for the read and write operation that occurred on the table. The integration of workloads is maintained by DynamoDB.

On-Demand capacity mode might be best in the scenario where:

The user tends to pay for only the exact usage.
The workload is unknown at a table.
The application traffic is unpredictable.

Source: Features and billing overview of On-Demand Capacity

2. Provisioned capacity mode — The user needs to set up read and write throughput while creating the table.

Provisioned capacity mode might be best in the scenario where:

The application traffic is predictable and steady.
The user can forecast capacity requirements to control costs.

Source: Features and billing overview of Provisioned Capacity

Application & Use Cases

Serverless Web Apps, Mobile Backend, Microservices.
Ad-Tech, Gaming, Retail, Banking & Finance, Media and entertainment, Software and internet.

Facts To Remember

Data is stored on SSD storage.
Automatically replicated across 3 Availability Zones in a particular AWS Region.
GSI can be added during table creation. It can be also added or deleted after the table has been created but can not modify it.
TTL (time to live) feature is available to help the users to reduce the cost by lessening the storage usage of redundant and expired data. Basically, it specifies a timestamp associated with the data and when it expires, the data is automatically deleted from the table.
DynamoDB is schemaless and can carry nested attributes where the depth can be supported up to 32 levels.
The initial limit of tables per region is 256.

Hands-On Stuff

Huh! It took a bit of reading patience to come to this section. Enough theoretical words are discussed. Now, let’s jump into some hands-on stuff with DynamoDB.

For example purpose, we will simply work with some random and sample Article data which will tend to hold some basic information about an article like the picture bellow —

Create Table

Table creation in DynamoDB is quite easy along with the AWS management console. Simply log in to AWS management console home, search for DynamoDB and click on it.

A screen like this will appear. Click on Create table.

Fill up the table’s basic information. Set up the primary key with a partition key and an optional sort key. Uncheck Use default settings to make custom settings.

By default, read/write capacity mode is selected to Provisioned. As we are going to perform some limited operations for practice purposes choosing On-demand mode for read/write capacity will be a good fit. After that, click the create button.

After the creation of the table, it will take to a page like the given picture. The Overview page describes all the basic information about the table.

Data can be seen in the Items section. Initially, it’s empty. New data can be inserted by clicking the Create item button.

Further, we will see how to do some fundamental database operations of DynamoDB in a programmatic way instead of using the AWS management console. To do this, we will configure an AWS lambda with Python3 and use Boto3 which is an SDK for Python to use different AWS services programmatically.

Search Lambda in the AWS management console and click on the create function button. It will take to a basic configuration page of lambda function creation.

Every lambda function needs an execution role to get authorized to access different AWS services defined by the role. To create a new role go to the IAM console. Select lambda and go to the next permission.

Search for dynamodb, select AmazonDynamoDBFullAccess and click next. The Tags part can be skipped for now.

Give the name of the role and click Create role button. Boom! The role is created.

Let’s get back to the lambda creation page. As now we have created a new role, we can select Use an existing role and then select lambdaDynamodbRole from the dropdown list. Click Create function and again Boom! we are done!

It will redirect to the IDE where we can try out writing and testing further codes.

Writing Data

We can insert new data objects with the following simple code

import json
import boto3def lambda_handler(event, context):
    dynamodb = boto3.resource("dynamodb")
    table_name = "article"
    article_table = dynamodb.Table(table_name)
    
    response = article_table.put_item(Item=event)
    return response

The input object can be sent within the event parameter. To set the input click on Configure test events in the drop-down list near the Test button. Put the JSON object and create it. After that click the Test button.

On a successful write, the program will respond with status code 200 and some other information. Write some more entry in the article table, the DynamoDB table which will look like this —

Updating Data

Let’s say, we want to update one or more attributes of an item. For example, update genre and title attribute value of the first item. First, create a new test event like this which will come within the event parameter —

{
  "id": 1,
  "published_date": "2019-12-09",
  "title": "La Casa De Papel",
  "genre": "TV Series"
}

Let’s modify the code to perform an update operation

import json
import boto3def lambda_handler(event, context):
    dynamodb = boto3.resource("dynamodb")
    table_name = "article"
    article_table = dynamodb.Table(table_name)
    
    response = update_item_with_new_data(event, article_table)
    return response
def update_item_with_new_data(event, article_table):
    response = article_table.update_item(
                Key={
                    'id': event['id'],
                    "published_date": event['published_date']
                },
                UpdateExpression="set genre = :val1, title = :val2",
                ExpressionAttributeValues={
                    ":val1": event['genre'],
                    ":val2": event['title']
                }
            )
    return response

One thing to notice that the Key argument here represents the primary key of the table. So that to identify and update the item, Key is set up with “id” and “published_date” which are partition key and sort key of article table respectively.

UpdateExpression defines one or more attributes to be updated, what action to apply and new values for them. The available actions are SET, ADD, REMOVE and DELETE. Multiple actions in a single expression are also available. For example, SET a=:val1, b=:val2 REMOVE :val3, :val4, :val5.

ExpressionAttributeValues used to substitute variable names with dynamic values in UpdateExpression which is referenced through a colon (:).

Reading Data

There are various types of data retrieving methodologies available in DynamoDB —

1. Get Single Item

Usually uses to get only a single and specific item by primary key from the table. Create a new test event like the following data —

{
  "id": 2,
  "published_date": "2019-12-10"
}

Run the following code

import json
import boto3def lambda_handler(event, context):
    dynamodb = boto3.resource("dynamodb")
    table_name = "article"
    article_table = dynamodb.Table(table_name)
    
    response = article_table.get_item(
                Key=event
            )
    return response

Will response with data like this

{
  "Item": {
    "image": "http://lorempixel.com/640/480/food",
    "wordcount": 123,
    "id": 2,
    "url": "https://food.info",
    "genre": "health",
    "published_date": "2019-12-10",
    "author": "Alice",
    "title": "Food for health"
  },
  "ResponseMetadata": {
    "RequestId": "VOU9A7LJS5LKG8SJ4P01MMNRG7VV4KQNSO5AEMVJF66Q9ASUAAJG",
    "HTTPStatusCode": 200,
    "HTTPHeaders": {
      "server": "Server",
      "date": "Sun, 15 Dec 2019 06:45:05 GMT",
      "content-type": "application/x-amz-json-1.0",
      "content-length": "246",
      "connection": "keep-alive",
      "x-amzn-requestid": "VOU9A7LJS5LKG8SJ4P01MMNRG7VV4KQNSO5AEMVJF66Q9ASUAAJG",
      "x-amz-crc32": "966473573"
    },
    "RetryAttempts": 0
  }
}

2. Scan

Scan operation usually retrieves all the items in the table. A single scan operation can retrieve up to 1 MB of data. If the table has data more than 1 MB then an extra key LastEvaluatedKey is returned on a single scan operation. This key can be used to scan and retrieve the next chunk of data and so on. Let’s see an example —

import json
import boto3def lambda_handler(event, context):
    dynamodb = boto3.resource("dynamodb")
    table_name = "article"
    article_table = dynamodb.Table(table_name)
    
    response = article_table.scan()
    return response

Response data (ResponseMetadata omitted)

{
  "Items": [
    {
      "image": "http://lorempixel.com/640/480/movie",
      "wordcount": 432,
      "id": 3,
      "url": "https://movies.info",
      "genre": "movie",
      "published_date": "2019-12-11",
      "author": "Bob",
      "title": "Top 10 Movies"
    },
    {
      "image": "http://lorempixel.com/640/480/food",
      "wordcount": 123,
      "id": 2,
      "url": "https://food.info",
      "genre": "health",
      "published_date": "2019-12-10",
      "author": "Alice",
      "title": "Food for health"
    },
    {
      "image": "http://lorempixel.com/640/480/programming",
      "wordcount": 342,
      "id": 2,
      "url": "https://programming.info",
      "genre": "programming",
      "published_date": "2019-12-12",
      "author": "Alice",
      "title": "C programming"
    },
    {
      "image": "http://lorempixel.com/640/480/people",
      "wordcount": 45,
      "id": 1,
      "url": "https://linda.info",
      "genre": "TV Seriese",
      "published_date": "2019-12-09",
      "author": "Miss Ashley Bernier",
      "title": "La Casa De Papel"
    }
  ],
  "Count": 4,
  "ScannedCount": 4
}

In case if the table has total items size more than 1MB then the code can be modified like this to retrieve all items

import json
import boto3def lambda_handler(event, context):
    dynamodb = boto3.resource("dynamodb")
    table_name = "article"
    article_table = dynamodb.Table(table_name)
    
    total_items = []
    response = article_table.scan()
    total_items.extend(response['Items'])
    
    while "LastEvaluatedKey" in response:
        response = article_table.scan(
            ExclusiveStartKey=response['LastEvaluatedKey']
        )
        total_items.extend(response['Items'])    return total_items

3. Query

Query operation used to retrieve one or more items based on the primary key. Retrieving by the query is usually more efficient. It has the same limitation of maximum 1MB data retrieval per query operation. Users can perform a query operation either by only the partition key or both partition key and sort key. But can’t do only with sort key. This is applicable to both Local Secondary Index (LSI)and Global Secondary Index (GSI).

Before diving into query operations, some more items are inserted for a good understanding of further query operations.

Query With Partition Key

import json
import boto3
from boto3.dynamodb.conditions import Keydef lambda_handler(event, context):
    dynamodb = boto3.resource("dynamodb")
    table_name = "article"
    article_table = dynamodb.Table(table_name)    response = article_table.query(
        KeyConditionExpression=Key('id').eq(event['id'])
    )
    return response

The below result is for partition key id 2.

{
  "Items": [
    {
      "image": "http://lorempixel.com/640/480/food",
      "wordcount": 123,
      "id": 2,
      "url": "https://food.info",
      "genre": "health",
      "published_date": "2019-12-10",
      "author": "Alice",
      "title": "Food for health"
    },
    {
      "image": "http://lorempixel.com/640/480/programming",
      "wordcount": 342,
      "id": 2,
      "url": "https://programming.info",
      "genre": "programming",
      "published_date": "2019-12-12",
      "author": "Alice",
      "title": "C programming"
    },
    {
      "image": "http://lorempixel.com/640/480/python",
      "wordcount": 604,
      "id": 2,
      "url": "https://python.info",
      "genre": "programming",
      "published_date": "2019-12-17",
      "author": "Alice",
      "title": "Python programming"
    }
  ],
  "Count": 3,
  "ScannedCount": 3
}

Query With Both Partition Key And Sort Key

Let’s assume we want to get items with id 2 which are published between 2019–12–10 and 2019–12–15. Set up the test event with —

{
 “id”: 2,
 “start_date”: “2019–12–10”,
 “end_date”: “2019–12–15”
}

and run with the following code

import json
import boto3
from boto3.dynamodb.conditions import Keydef lambda_handler(event, context):
    dynamodb = boto3.resource("dynamodb")
    table_name = "article"
    article_table = dynamodb.Table(table_name)    response = article_table.query(
        KeyConditionExpression=Key('id').eq(event['id']) & 
            Key('published_date').between(event['start_date'],                                                               event['end_date'])
    )    return response

Returned response

{
  "Items": [
    {
      "image": "http://lorempixel.com/640/480/food",
      "wordcount": 123,
      "id": 2,
      "url": "https://food.info",
      "genre": "health",
      "published_date": "2019-12-10",
      "author": "Alice",
      "title": "Food for health"
    },
    {
      "image": "http://lorempixel.com/640/480/programming",
      "wordcount": 342,
      "id": 2,
      "url": "https://programming.info",
      "genre": "programming",
      "published_date": "2019-12-12",
      "author": "Alice",
      "title": "C programming"
    }
  ],
  "Count": 2,
  "ScannedCount": 2
}

Query with GSI:

Let’s say, we want to get articles of a programming genre with words less than or equal to 600. In such a case, we need to create a GSI where the partition key will genre and sort key will be wordcount attribute. While querying with GSI the index name should be mentioned.

Edit the test event with

{
  "genre": "programming",
  "wordcount": 600
}

And run the code

import json
import boto3
from boto3.dynamodb.conditions import Keydef lambda_handler(event, context):
    
    dynamodb = boto3.resource("dynamodb")
    table_name = "article"
    article_table = dynamodb.Table(table_name)
    
    response = article_table.query(
        IndexName='genre-wordcount-index',
        KeyConditionExpression=Key('genre').eq(event['genre']) & Key('wordcount').lte(event['wordcount']) 
    )
    
    return response

Returned response

{
  "Items": [
    {
      "image": "http://lorempixel.com/640/480/programming",
      "wordcount": 342,
      "url": "https://programming.info",
      "id": 2,
      "genre": "programming",
      "published_date": "2019-12-12",
      "author": "Alice",
      "title": "C programming"
    },
    {
      "image": "http://lorempixel.com/640/480/java",
      "wordcount": 564,
      "url": "https://java.info",
      "id": 3,
      "genre": "programming",
      "published_date": "2019-12-14",
      "author": "Bob",
      "title": "Java programming"
    }
  ],
  "Count": 2,
  "ScannedCount": 2
}

There is another way available to retrieve items in a batch. But it has some limitations and drawbacks along with complexities to maintain. That’s why not encouraged.

Deleting Data

The deletion operation can be simply performed with the primary key. Let’s Create a new test event

{
  "id": 2,
  "published_date": "2019-12-10"
}

And then run the following code

import json
import boto3def lambda_handler(event, context):
    dynamodb = boto3.resource("dynamodb")
    table_name = "article"
    article_table = dynamodb.Table(table_name)
    
    response = article_table.delete_item(
                Key=event
            )
    return response

One thing to remember that, the response always comes with status code 200 even if any entry with the primary key doesn’t exist in the table.

To explore more advanced usage of the operations discussed above please visit the documentation of boto3.

End Note

The article is aimed to let newcomers have an overview idea of DynamoDB and learn basic database operations without facing any prior complexities. There are still a lot more scopes to learn about it which can be served by visiting the reference section. Hope to meet in another article very soon and don’t forget to share feedback.Thank you.

References:

Break The Ice With DynamoDB

An overview discussion with some fundamental hands-on tasks

Table of Content

Components

Supported Data Types

Primary Key

Concept of Indexing

Consistency Model

Pricing

Application & Use Cases

Facts To Remember

Hands-On Stuff

Create Table

Writing Data

Updating Data

Reading Data

1. Get Single Item

2. Scan

3. Query

Query With Partition Key

Query With Both Partition Key And Sort Key

Deleting Data

End Note

Written by RC Tushar