A Five Minute Overview of Amazon SimpleDB
Sometimes we are working on a project where we need a data store, but the complexities of Relational Database Service (RDS), DynamoDB, DocumentDB, et al are more than what is needed. This is where Amazon SimpleDB becomes a valuable resource.
https://open.spotify.com/episode/77BybWgy6VHfCxS2LXrb8V?si=ehEKXoHPTVqhlmYkGoHbyw
SimpleDB is a NoSQL database. NoSQL databases are not new, having been around since the 1960s. The term NoSQL can have several different meanings from non-SQL, referring to the lack of relation support in the database, to Not only SQL meaning the database may support Structured Query Language (SQL) Wikipedia.
AWS has a number of databases to meet the needs of your project. If you look in the AWS Management Console, the Database section lists:
- Relational Database Service
- DynamoDB
- ElastiCache
- Neptune
- Amazon QLDB
- Amazon DocumentDB
- Amazon Keyspaces
- Amazon TimeStream
Did you notice SimpleDB is missing from the list? This is because there is no interface to SimpleDB through the console. SimpleDB tables, which are called domains, are created programmatically using the CLI, SDK, or web services requests and all operations are performed through those interfaces.
Why use SimpleDB?
Database management is a science of its own. Schema designs, Entity-Relationship models, query optimization, and the day to day management breed complexity into a project. And every database or database engine is unique in its own right. SimpleDB removes the complexity of database management by being NoSQL and having no administrative overhead. The AWS documentation states “Amazon SimpleDB is optimized to provide high availability and flexibility, with little or no administrative burden” Amazon SimpleDB.
The SimpleDB architecture is designed to be highly available, by automatically creating geographically distributed copies of your data. If one replica fails, another is seamlessly used to access your data.
Because there is no rigid schema to support, changing the attributes needed to support your project is simply a matter of adding the additional columns, which are called attributes in SimpleDB.
And SimpleDB is secure, using HTTPS as the transport and integrating with IAM to provide fine-grained control over the operations and data.
Same of the sample use cases for using SimpleDB include logging, online gaming, and S3 Object Metadata indexing Amazon SimpleDB.
With that introduction out of the way, let’s look at working with SimpleDB using the Software Development Kit.
Working with SimpleDB using the SDK
The examples in this section use Python but are explained so you don’t need to know Python to follow them. If you don’t know, the Python3 SDK is called boto3.
Connecting to the SimpleDB Service
Before we can work with SimpleDB, we have established a connection to the service.
try:
session = boto3.session.Session()
except Exception as error:
logging.error("Error: cannot create service session: %s", error)
raise error
try:
client = session.client("sdb", region_name="us-east-1")
except Exception as error:
logging.error("Cannot connect to %s in %s:%s", service, region, error)
raise error
The first try block creates a session, which can be used to create connections to multiple services if needed, while the second try block creates a connection to the SimpleDB service. If the session or client cannot be established, then an error is raised to the calling function. Once the client connection to the SimpleDB endpoint has been created, we are ready to work with the service.
Creating a SimpleDB Domain
Before we can work with data, we have to create a domain if we don’t already have one. This is done using the create_domain API call.
try:
client.create_domain(DomainName=domain)
except Exception as error:
raise error
The single argument to create_domain is the domain or table name. Domain names must be unique within the account. Initially, up to 250 domains can be created, and it the user’s responsibility to determine how to shard or partition the data to not exceed the 10 GB hard limit on domains. With the domain created, we can now insert some data.
Listing the Available Domains
We will eventually want to see all of the domains we have created. We can use the list_domains API to obtain the list. this is best done using a paginator, allowing the retrieval of all of the domains without worrying about the maximum number of retrieved items being reached.
token = None
domain_list = []
# create the paginator for the list_domains API
try:
paginator = client.get_paginator('list_domains')
except Exception as error:
raise error
# create a page iterator which returns 100 items per
# page
try:
page_iterator = paginator.paginate(
PaginationConfig={
'PageSize': 100,
'StartingToken': token
}
)
except Exception as error:
raise error
# work through the items on each page
try:
for page in page_iterator:
# for each item, add the domain to the
# domain_list
for pg in page["DomainNames"]:
domain_list.append(pg)
# see if we have another page to process
try:
token = page["nextToken"]
except KeyError:
break
except Exception as error:
raise error
# return the list of domains to the calling function
return domain_list
Using a paginator regardless of what language you are working with is a good idea because you are not limited to the maximum number of items the API for your programming language returns. When this code executes, the result is a list of domains which can then be displayed.
Inserting Items into the Domain
If you have a lot of attributes, preparing the data to insert into the domain can be a little tedious. We’ll come back to that in a minute. Inserting items into the domain uses the put_attributes function.
try:
response = client.put_attributes(
DomainName=domain,
ItemName=item,
Attributes=attributes
)
except Exception as error:
logging.error("insertion {domain}: %s", error)
raise error
We have to specify the domain we are inserting the item into, the name of the item, and the attributes. The item name must be unique in the domain. If the item name already exists, then SimpleDB will attempt to update the existing item with the attributes provided.
I mentioned defining the attributes can be a little tedious. This is because attributes are defined as name-value pairs. In Python, this would look like
attributes = [
{
"Name": "attribute1",
"Value": "value1"
},
{
"Name": "attribute2",
"Value": "attribute2"
},
{
"Name": "attributeN",
"Value": attributeN
},
]
Therefore, the more attributes, the more tedious it gets. However, if your data is already stored in a Python dictionary, then creating the attributes is simple.
attributes = []
for key, value in some.items():
attributes.append({"Name": key, "Value": str(value)})
This brings up an important point: SimpleDB doesn’t understand any data type other than a string. If your data includes things like integer and boolean values, they must be represented as strings when stored in SimpleDB.
The second point is the third field in the attribute definition: Replace. If you are updating an item with the action, adding in the Replace field with a value of true will cause SimpleDB to update the record if it already exists.
attributes = [
{
"Name": "attribute1",
"Value": "value1",
"Replace": True
}
]
Domain Metadata
Before we look at retrieving data from our SimpleDB domain, let’s look at how we can get information about the domain using the domain_metadata function. This function allows you to determine when the domain was created, the number of items and attributes, and the size of those attribute names and values.
Assuming we already have a client connection to SimpleDB, we can do the following:
try:
response = client.domain_metadata(
DomainName=domain
)
except Exception as error:
logging.error("{domain}: %s", error)
raise error
print(f"Domain: {Domain}")
print(
f"Domain created on {datetime.datetime.fromtimestamp(response['Timestamp'])}"
)
print(f"Total items: {response['ItemCount']}")
print(f"Total attribute names: {response['AttributeNameCount']}")
print(f"Total attribute values: {response['AttributeValueCount']}")
storage_used = response['ItemNamesSizeBytes'] + response['AttributeNamesSizeBytes'] + response['AttributeValuesSizeBytes']
print(
f"Total Domain size: {storage_used} bytes {storage_used/MB:.2f} MB, {storage_used/GB:.2f} GB"
)
if storage_used >= HALF:
print("The domain size is 50% of the maximum domain size")
elif storage_used >= THRESHOLD:
print(
"The domain size is 90% of the maximum domain size. Inserts into the domain will fail when the maximum size is reached."
)
If we execute this on my sample SimpleDB domain I am using for a project, we see:
Domain: Assessments
Domain created on 2020-10-17 13:05:01
Total items: 31301
Total attribute names: 91
Total attribute values: 2849831
Total Domain size: 5477676 bytes 5.35 MB, 0.01 GB
There are indeed 31,301 items in the domain with a total of 91 unique attribute names. The number of attribute values is determined by multiplying the number of attribute names and the total number of items. This means there are 2,849,831 total attributes in the domain. These attributes are all text and only use 5.35 MB. The total size of each item, its attribute names and data is 175 bytes.
This is the primary reason for using SimpleDB in this project. It is fast, small, and as we will see a little later, inexpensive. It is also a good example of why RDS and DynamoDB are not good use cases — the operational cost is just not reasonable for the amount of data being consumed.
At this point, we can create a SimpleDB domain, insert items, and retrieve the metadata for the domain. Let’s look at retrieving data from the domain.
Retrieving Items from the Domain
There are two methods for retrieving data from your domain: get_attributes and select. If you already know the Item name, then you can use the get_attributes function to retrieve the attributes for that one item. However, if you don’t know the item name or want to retrieve all of the items meeting specific criteria, we use the select function.
The select function works similarly to the SQL SELECT command, allowing you to retrieve the desired attributes (columns) for the items (rows) matching the criteria specified in the select statement. Here are some examples using the AWS CLI:
Find out how many items are in the domain (which can also be accomplished using the domain_metadata function):
aws sdb select --select-expression "select count(*) from Assessments"
{
"Items": [
{
"Name": "Domain",
"Attributes": [
{
"Name": "Count",
"Value": "31301"
}
]
}
]
}
Retrieve a specific attribute:
aws sdb select --select-expression "select BirthYear from Assessments"
Retrieve a group of attributes:
aws sdb select --select-expression "select BirthYear, Gender from Assessments"
IF we look at the last example, the response from SimpleDB looks like
{
"Items": [
{
"Name": "20180717230440",
"Attributes": [
{
"Name": "BirthYear",
"Value": "1981"
},
{
"Name": "Gender",
"Value": "male"
}
]
},
{
"Name": "20170712184415",
"Attributes": [
{
"Name": "BirthYear",
"Value": "1974"
},
{
"Name": "Gender",
"Value": "male"
}
]
},
For each item found in the select statement, you get the item Name and the values for the specified attributes.
There are no indexes in SimpleDB. This means retrieving all of the affected rows can be slow. For example, the command
aws sdb select --select-expression "select BirthYear, Gender from Assessments"
takes approximately 10 seconds for the 31,301 items using the CLI. The same request using the SDK takes 1.25 seconds.
If we want to put this into a Python function, we could do this:
try:
paginator = client.get_paginator('select')
except Exception as error:
raise error
try:
page_iterator = paginator.paginate(
SelectExpression=f"select BirthYear,Gender from Assessments",
ConsistentRead=consistentRead,
PaginationConfig={
'MaxItems': 500,
'StartingToken': token
}
)
except Exception as error:
raise error
try:
for page in page_iterator:
for pg in page["Items"]:
selected.append(pg)
try:
token = page["NextToken"]
except KeyError:
break
except Exception as error:
logging.error("Cannot retrieve data: %s", error)
raise error
print(selected)
This code fragment creates the paginator for the select function and then executes the select statement, which is “hardcoded” in the script (not what you would do). We then loop through all of the items returned until there is no NextToken and then print the selected items. This example sets MaxItems to 500, but the maximum returned size is 1MB. Regardless of what MaxItems is set to, if the size of the response is more than 1MB, the response will be split into multiple pages.
Pricing
The pricing model makes SimpleDB hard to beat. The Free Tier provides 25 machine-hours, 1 GB of storage, unlimited data in, and up to 1 GB of data out a month. That is a pretty significant allocation. The research work and work on a project which I am implementing with SimpleDB will result in no charges for quite a while.
If you exceed the 25 machine hours, the cost is $0.14 per machine hour over 25. Storage is $0.25 per GB over the 1 GB os free storage, and data transfer out starts at $0.09 per GB after the free tier is exhausted.
If you need a small database, don’t need console access, and don’t need the overhead or capabilities of an RDBMS, then SimpleDB is hard to beat.
Things to Know
Before wrapping up this article, there are some things worth knowing before deciding to use SimpleDB on your next project:
- CloudFormation has no interface to create or manage SimpleDB resources. It has to be done using the CLI or the SDK.
- A domain, or table, has a hard limit of 10 GB in size, which cannot be changed. If you think the domain will grow over 10GB, a data sharding plan or alternate database should be considered.
- SimpleDB has capacity limits, typically under 25 writes/second. If you expect to need higher capacity, then an alternate database may be a wise choice.
- There is a soft limit of 250 domains. You can request to have this increased if needed.
- The maximum size of an attribute is 1024 bytes, which cannot be changed.
- All data must be represented as strings.
- SimpleDB is not available in all regions.
- There are no indexes.
- If you have to retrieve all or a large number of items in the domain to perform an operation, it is best to retrieve all of the attributes you expect to need instead of making repeated calls to the domain. If you are using AWS Lambda, this can also affect the amount of memory needed as you will need to account for the size of the response variable you will receive.
In Conclusion
SimpleDB offers CLI, SDK, and Web API REST interfaces, making it easy to interact with from many different sources. The SDK is significantly faster than the CLI, meaning it may be better to write small programs to do the work of the CLI. (The CLI examples were done using AWS CLI Version 1. Version 2 may be considerably faster.)
This may very well be a viable database for your next project. Short-lived data which is transient could be written to a domain and when not needed any longer, deleted. Log data could be saved to a SimpleDB domain instead of going to DynamoDB, or RDS which are expensive solutions for this use case.
References
Integrating Amazon S3 and Amazon SimpleDB
About the Author
Chris is a highly-skilled Information Technology, AWS Cloud, Training and Security Professional bringing cloud, security, training, and process engineering leadership to simplify and deliver high-quality products. He is the co-author of seven books and author of more than 70 articles and book chapters in technical, management, and information security publications. His extensive technology, information security, and training experience make him a key resource who can help companies through technical challenges. Chris is a member of the AWS Community Builder Program.
Copyright
This article is Copyright © 2020, Chris Hare.