Boto : DynamoDB python client library

Rajnish Kumar
4 min readMar 7, 2019

--

In this post, we will use the AWS SDK for Python (Boto 3) to write simple programs to perform the following Amazon DynamoDB operations

Boto3 is the Python SDK to interact with the Amazon Web Services. DynamoDB are databases inside AWS in a noSQL format, and boto3 contains methods/classes to deal with them. This post assumes the AWS CLI (the tool to set access/authorization to the cloud) has been set, it can be easily done via terminal. This post outlines some operations on DynamoDB databases, run through boto3. Getting started with Boto 3 is easy, but requires a few steps.

Installation
Install the latest Boto 3 release via pip:

pip install boto3
# You may also install a specific version:
pip install boto3=1.0.0

Configuration
Before you can begin using Boto 3, you should set up authentication credentials. Credentials for your AWS account can be found in the IAM Console. You can create or use an existing user. Go to manage access keys and generate a new set of keys.

If you have the AWS CLI installed, then you can use it to configure your credentials file:

aws configure

Alternatively, you can create the credential file yourself. By default, its location is at ~/.aws/credentials:

[default]
aws_access_key_id = YOUR_ACCESS_KEY
aws_secret_access_key = YOUR_SECRET_KEY

You may also want to set a default region. This can be done in the configuration file. By default, its location is at ~/.aws/config:

[default]
region=us-east-1

Alternatively, you can pass a region_name when creating clients and resources.

This sets up credentials for the default profile as well as a default region to use when creating connections.

Using Boto 3

To use Boto 3, you must first import it and tell it what service you are going to use:

Let us assume you have a certain table in DynamoDB. We’ll start by importing the relevant stuff and by initialising the resource for the DynamoDB:

from boto3 import resource
from boto3.dynamodb.conditions import Key
# The boto3 dynamoDB resource
dynamodb_resource = resource('dynamodb')

You’d call your table as

table = dynamodb_resource.Table(table_name)

where table_name is just the string specifying the name of your table in DynamoDB.

Because certain operations can be expensive on DynamoDBs and there isn’t really any way to run aggregations (differently from MongoDB which has a full aggregation framework), you might want to start knowing a bit about your table. Specifically, scan operations are as slow as the number of items in your table dictates, as they have to walk the table. It’s typically useful to know the size, the number of items, what is the field covering the role of a primary key, and so on. So, I’ve collected all relevant attributes in a convenient dict as

def get_table_metadata(table_name):
"""
Get some metadata about chosen table.
"""
table = dynamodb_resource.Table(table_name)
return {
'num_items': table.item_count,
'primary_key_name': table.key_schema[0],
'status': table.table_status,
'bytes_size': table.table_size_bytes,
'global_secondary_indices': table.global_secondary_indexes
}

Say for instance you have hundreds of thousands of items in table, then a scan might not be a great idea. At least you know beforehand!

Now, a GET, a PUT and a DELETE can be performed as:

def read_table_item(table_name, pk_name, pk_value):
"""
Return item read by primary key.
"""
table = dynamodb_resource.Table(table_name)
response = table.get_item(Key={pk_name: pk_value})
return responsedef add_item(table_name, col_dict):
"""
Add one item (row) to table. col_dict is a dictionary {col_name: value}.
"""
table = dynamodb_resource.Table(table_name)
response = table.put_item(Item=col_dict)
return responsedef delete_item(table_name, pk_name, pk_value):
"""
Delete an item (row) in table from its primary key.
"""
table = dynamodb_resource.Table(table_name)
response = table.delete_item(Key={pk_name: pk_value})
return

The two main operations you can run to retrieve items from a DynamoDB table are query and scan. The AWS docs explain that while a query is useful to search for items via primary key, a scan walks the full table, but filters can be applied. The basic way to achieve this in boto3 is via the query and scan APIs:

def scan_table(table_name, filter_key=None, filter_value=None):
"""
Perform a scan operation on table.
Can specify filter_key (col name) and its value to be filtered.
"""
table = dynamodb_resource.Table(table_name)
if filter_key and filter_value:
filtering_exp = Key(filter_key).eq(filter_value)
response = table.scan(FilterExpression=filtering_exp)
else:
response = table.scan()
return responsedef query_table(table_name, filter_key=None, filter_value=None):
"""
Perform a query operation on the table.
Can specify filter_key (col name) and its value to be filtered.
"""
table = dynamodb_resource.Table(table_name)
if filter_key and filter_value:
filtering_exp = Key(filter_key).eq(filter_value)
response = table.query(KeyConditionExpression=filtering_exp)
else:
response = table.query()
return response

The actual items of the table will be in the ‘Items’ key of the response dictionary.

The issue here is that results in a DynamoDB table are paginated hence it is not guaranteed that this scan will be able to grab all the data in table, which is yet another reason to keep track of how many items there are and how many you end up with at the end when scanning.

In order to scan the table page by page, we need to play a bit around the parameter leading us to the next page in a loop, until we have seen the full table.

So you can do a loop as in:

def scan_table_allpages(table_name, filter_key=None, filter_value=None):
"""
Perform a scan operation on table.
Can specify filter_key (col name) and its value to be filtered.
This gets all pages of results. Returns list of items.
"""
table = dynamodb_resource.Table(table_name)
if filter_key and filter_value:
filtering_exp = Key(filter_key).eq(filter_value)
response = table.scan(FilterExpression=filtering_exp)
else:
response = table.scan()
items = response['Items']
while True:
print len(response['Items'])
if response.get('LastEvaluatedKey'):
response = table.scan(ExclusiveStartKey=response['LastEvaluatedKey'])
items += response['Items']
else:
break
return items;

--

--