A Five Minute Overview of Amazon SimpleDB

Chris Hare
Oct 19, 2020 · 10 min read

Sometimes we are working on a project where we need a data store, but the complexities of Relational Database Service (RDS), DynamoDB, DocumentDB, et al are more than what is needed. This is where Amazon SimpleDB becomes a valuable resource.

https://open.spotify.com/episode/77BybWgy6VHfCxS2LXrb8V?si=ehEKXoHPTVqhlmYkGoHbyw

SimpleDB is a NoSQL database. NoSQL databases are not new, having been around since the 1960s. The term NoSQL can have several different meanings from non-SQL, referring to the lack of relation support in the database, to Not only SQL meaning the database may support Structured Query Language (SQL) Wikipedia.

AWS has a number of databases to meet the needs of your project. If you look in the AWS Management Console, the Database section lists:

  • Relational Database Service
  • DynamoDB
  • ElastiCache
  • Neptune
  • Amazon QLDB
  • Amazon DocumentDB
  • Amazon Keyspaces
  • Amazon TimeStream

Did you notice SimpleDB is missing from the list? This is because there is no interface to SimpleDB through the console. SimpleDB tables, which are called domains, are created programmatically using the CLI, SDK, or web services requests and all operations are performed through those interfaces.

Why use SimpleDB?

Database management is a science of its own. Schema designs, Entity-Relationship models, query optimization, and the day to day management breed complexity into a project. And every database or database engine is unique in its own right. SimpleDB removes the complexity of database management by being NoSQL and having no administrative overhead. The AWS documentation states “Amazon SimpleDB is optimized to provide high availability and flexibility, with little or no administrative burden” Amazon SimpleDB.

The SimpleDB architecture is designed to be highly available, by automatically creating geographically distributed copies of your data. If one replica fails, another is seamlessly used to access your data.

Because there is no rigid schema to support, changing the attributes needed to support your project is simply a matter of adding the additional columns, which are called attributes in SimpleDB.

And SimpleDB is secure, using HTTPS as the transport and integrating with IAM to provide fine-grained control over the operations and data.

Same of the sample use cases for using SimpleDB include logging, online gaming, and S3 Object Metadata indexing Amazon SimpleDB.

With that introduction out of the way, let’s look at working with SimpleDB using the Software Development Kit.

Working with SimpleDB using the SDK

The examples in this section use Python but are explained so you don’t need to know Python to follow them. If you don’t know, the Python3 SDK is called boto3.

Connecting to the SimpleDB Service

Before we can work with SimpleDB, we have established a connection to the service.

The first try block creates a session, which can be used to create connections to multiple services if needed, while the second try block creates a connection to the SimpleDB service. If the session or client cannot be established, then an error is raised to the calling function. Once the client connection to the SimpleDB endpoint has been created, we are ready to work with the service.

Creating a SimpleDB Domain

Before we can work with data, we have to create a domain if we don’t already have one. This is done using the create_domain API call.

The single argument to create_domain is the domain or table name. Domain names must be unique within the account. Initially, up to 250 domains can be created, and it the user’s responsibility to determine how to shard or partition the data to not exceed the 10 GB hard limit on domains. With the domain created, we can now insert some data.

Listing the Available Domains

We will eventually want to see all of the domains we have created. We can use the list_domains API to obtain the list. this is best done using a paginator, allowing the retrieval of all of the domains without worrying about the maximum number of retrieved items being reached.

Using a paginator regardless of what language you are working with is a good idea because you are not limited to the maximum number of items the API for your programming language returns. When this code executes, the result is a list of domains which can then be displayed.

Inserting Items into the Domain

If you have a lot of attributes, preparing the data to insert into the domain can be a little tedious. We’ll come back to that in a minute. Inserting items into the domain uses the put_attributes function.

We have to specify the domain we are inserting the item into, the name of the item, and the attributes. The item name must be unique in the domain. If the item name already exists, then SimpleDB will attempt to update the existing item with the attributes provided.

I mentioned defining the attributes can be a little tedious. This is because attributes are defined as name-value pairs. In Python, this would look like

Therefore, the more attributes, the more tedious it gets. However, if your data is already stored in a Python dictionary, then creating the attributes is simple.

This brings up an important point: SimpleDB doesn’t understand any data type other than a string. If your data includes things like integer and boolean values, they must be represented as strings when stored in SimpleDB.

The second point is the third field in the attribute definition: Replace. If you are updating an item with the action, adding in the Replace field with a value of true will cause SimpleDB to update the record if it already exists.

Domain Metadata

Before we look at retrieving data from our SimpleDB domain, let’s look at how we can get information about the domain using the domain_metadata function. This function allows you to determine when the domain was created, the number of items and attributes, and the size of those attribute names and values.

Assuming we already have a client connection to SimpleDB, we can do the following:

If we execute this on my sample SimpleDB domain I am using for a project, we see:

There are indeed 31,301 items in the domain with a total of 91 unique attribute names. The number of attribute values is determined by multiplying the number of attribute names and the total number of items. This means there are 2,849,831 total attributes in the domain. These attributes are all text and only use 5.35 MB. The total size of each item, its attribute names and data is 175 bytes.

This is the primary reason for using SimpleDB in this project. It is fast, small, and as we will see a little later, inexpensive. It is also a good example of why RDS and DynamoDB are not good use cases — the operational cost is just not reasonable for the amount of data being consumed.

At this point, we can create a SimpleDB domain, insert items, and retrieve the metadata for the domain. Let’s look at retrieving data from the domain.

Retrieving Items from the Domain

There are two methods for retrieving data from your domain: get_attributes and select. If you already know the Item name, then you can use the get_attributes function to retrieve the attributes for that one item. However, if you don’t know the item name or want to retrieve all of the items meeting specific criteria, we use the select function.

The select function works similarly to the SQL SELECT command, allowing you to retrieve the desired attributes (columns) for the items (rows) matching the criteria specified in the select statement. Here are some examples using the AWS CLI:

Find out how many items are in the domain (which can also be accomplished using the domain_metadata function):

Retrieve a specific attribute:

Retrieve a group of attributes:

IF we look at the last example, the response from SimpleDB looks like

For each item found in the select statement, you get the item Name and the values for the specified attributes.

There are no indexes in SimpleDB. This means retrieving all of the affected rows can be slow. For example, the command aws sdb select --select-expression "select BirthYear, Gender from Assessments" takes approximately 10 seconds for the 31,301 items using the CLI. The same request using the SDK takes 1.25 seconds.

If we want to put this into a Python function, we could do this:

This code fragment creates the paginator for the select function and then executes the select statement, which is “hardcoded” in the script (not what you would do). We then loop through all of the items returned until there is no NextToken and then print the selected items. This example sets MaxItems to 500, but the maximum returned size is 1MB. Regardless of what MaxItems is set to, if the size of the response is more than 1MB, the response will be split into multiple pages.

Pricing

The pricing model makes SimpleDB hard to beat. The Free Tier provides 25 machine-hours, 1 GB of storage, unlimited data in, and up to 1 GB of data out a month. That is a pretty significant allocation. The research work and work on a project which I am implementing with SimpleDB will result in no charges for quite a while.

If you exceed the 25 machine hours, the cost is $0.14 per machine hour over 25. Storage is $0.25 per GB over the 1 GB os free storage, and data transfer out starts at $0.09 per GB after the free tier is exhausted.

If you need a small database, don’t need console access, and don’t need the overhead or capabilities of an RDBMS, then SimpleDB is hard to beat.

Things to Know

Before wrapping up this article, there are some things worth knowing before deciding to use SimpleDB on your next project:

  • CloudFormation has no interface to create or manage SimpleDB resources. It has to be done using the CLI or the SDK.
  • A domain, or table, has a hard limit of 10 GB in size, which cannot be changed. If you think the domain will grow over 10GB, a data sharding plan or alternate database should be considered.
  • SimpleDB has capacity limits, typically under 25 writes/second. If you expect to need higher capacity, then an alternate database may be a wise choice.
  • There is a soft limit of 250 domains. You can request to have this increased if needed.
  • The maximum size of an attribute is 1024 bytes, which cannot be changed.
  • All data must be represented as strings.
  • SimpleDB is not available in all regions.
  • There are no indexes.
  • If you have to retrieve all or a large number of items in the domain to perform an operation, it is best to retrieve all of the attributes you expect to need instead of making repeated calls to the domain. If you are using AWS Lambda, this can also affect the amount of memory needed as you will need to account for the size of the response variable you will receive.

In Conclusion

SimpleDB offers CLI, SDK, and Web API REST interfaces, making it easy to interact with from many different sources. The SDK is significantly faster than the CLI, meaning it may be better to write small programs to do the work of the CLI. (The CLI examples were done using AWS CLI Version 1. Version 2 may be considerably faster.)

This may very well be a viable database for your next project. Short-lived data which is transient could be written to a domain and when not needed any longer, deleted. Log data could be saved to a SimpleDB domain instead of going to DynamoDB, or RDS which are expensive solutions for this use case.

References

Amazon SimpleDB

Amazon SimpleDB API Usage

Amazon SimpleDB FAQ

Amazon SimpleDB Pricing

Integrating Amazon S3 and Amazon SimpleDB

Running Databases on AWS

Wikipedia — NoSQL

About the Author

Chris is a highly-skilled Information Technology, AWS Cloud, Training and Security Professional bringing cloud, security, training, and process engineering leadership to simplify and deliver high-quality products. He is the co-author of seven books and author of more than 70 articles and book chapters in technical, management, and information security publications. His extensive technology, information security, and training experience make him a key resource who can help companies through technical challenges. Chris is a member of the AWS Community Builder Program.

Copyright

This article is Copyright © 2020, Chris Hare.

The Startup

Medium's largest active publication, followed by +754K people. Follow to join our community.

Chris Hare

Written by

Chris is the co-author of seven books and author of more than 70 articles and book chapters in technical, management, and information security publications.

The Startup

Medium's largest active publication, followed by +754K people. Follow to join our community.

Chris Hare

Written by

Chris is the co-author of seven books and author of more than 70 articles and book chapters in technical, management, and information security publications.

The Startup

Medium's largest active publication, followed by +754K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store