The ABC of Amazon DynamoDB: An Introductory Guide

TensorIoT Editor

Published in

TensorIoT

13 min readMay 23, 2023

By Cami La Porte, TensorIoT Software Engineer and Nicholas Burden, Technical Evangelist for TensorIoT

Introduction

Amazon DynamoDB is a powerful database tool, but what happens if you’ve been given a DynamoDB project and have never worked with it before? Or are you looking to increase your understanding of databases but struggling to grasp associated concepts? We’ve got you covered. In this blog, you will find a brief explanation of databases and key features of Amazon DynamoDB, setting the foundation for a future blog where we’ll provide you with an in-depth guide to help you design schemas for DynamoDB accompanied with several examples.

Whether you’re familiar with other databases or have no experience with databases at all, this article aims to lay the groundwork for your understanding of DynamoDB. After reading both posts, you’ll have the tools and confidence to tackle DynamoDB projects from the ground up.

In our first blog post, we will get you familiar with some basic database information and introduce the fundamentals of DynamoDB.

Database 101

Let’s begin by exploring what databases are, how they’re used, and the primary database types. (If you’re already conversant with databases, feel free to skip this section and proceed straight to the fundamentals of Amazon DynamoDB.)

What is a database?

At its simplest, a database is an organized collection of data, stored and accessed electronically. Think of a database as a digital file cabinet that stores and organizes information so you can find what you need quickly and efficiently.

Just like how a file cabinet can be organized using labeled dividers and categorized drawers, and expanded by adding more cabinets for extra paperwork, databases allow you to design a system to store your data and enable quick access to the information. You’ll be using digital equivalents of dividers, drawers, and cabinets, known as keys, rows, indexes, tables, and more, which we’ll discuss later in the blog.

When visualizing a database in our human minds, it will resemble a table or spreadsheet, where rows and columns are the structure for the cells that hold the values. A table’s data is organized by labeling the row and column “headers”, while most database tables are organized using labels commonly known as “keys”.

Defining keys gives a structure to your database, enabling programs to find the exact piece of data you are looking for. This is like finding a map location by using latitude and longitude coordinates, or in our filing cabinet example, knowing the drawer and diver label for a specific document. The database term for this search action is “Query”.

How are databases used?

A database is used whenever you want to store information and retrieve the data on demand. In the digital space of the internet of software development, this is one of the core functionalities of any system or application. If you think of how any business operates and our file cabinet example, it would be extremely difficult to keep track of employee records, customer accounts, or purchase orders without a basic filing system. These days, most physical file cabinets have been replaced by software that utilizes database technology.

To expand on that example, when you browse an online store like Amazon with millions of products in their catalog, each product is stored in a database and every time you click on a product, the site retrieves information about that product from the database and sends it back to you in a manner of milliseconds. Additionally, a database can be used to collect and store data. Think of any social networking site that you use and if you’ve ever uploaded a photo or posted a comment, you’ve performed actions that send information to a database which is later retrieved anytime someone tries to view it.

It’s almost impossible for any digital system or application to function properly without the use of a database. This is why it’s important that we understand database technology and how to use it effectively.

What types of databases?

The final basic database concept to cover is knowing the most common database types and the differences between them.In this section, we will explore the most common database types and provide example tables of each.

Deciding on a database type

There are several types of databases that currently exist and deciding which one to use depends on what your goals are for managing your information. While there are many factors that can go into making this decision (such as business requirements, technical limitations, budgets, etc.), the most important criteria will ultimately be determined by your data.

Having intimate knowledge of your data and a clear vision of how it will be used plays a vital role in selecting the best database option for your project.

Each database type primarily differs by these main factors:

the volume of information to store and access
the structure used to organize this data
how the database will be maintained

To reference our filing cabinet analogy, it would be like selecting a cabinet:

with four drawers instead of two, because you have a large number of documents to categorize
that stores legal-size documents instead of letter-size documents because all your files are legal contracts
that is lateral instead of vertical because you want to file documents chronologically/left-to-right

In this day and age, we have several options available to us, but in the scope of learning about Amazon DynamoDB, we focus on the two main types of databases: Relational and Non-Relational.

Relational vs. Non-Relational Databases

One of your first tasks is determining the optimal database type for your specific project, which will help you choose what database to use. The good news is since there are many ways to achieve the same goal, there isn’t a single correct choice. It’s not a matter of right or wrong, but what works best for your particular needs.

Relational Database

A relational database categorizes data in different tables which are linked together by using a single, common value that is shared within each table. This single data point, which is classified as a “primary key”, is how relationships between tables are defined. It’s where the term “relational” database comes from. All tables are related to one another because they share a primary key.

Relational databases are also known as “SQL databases”, which stands for Structured Query Language. SQL is the programming language used to manage and perform operations in relational databases. As the name suggests, it is a very specific language and specifically applies to SQL/relational databases.

Commonly used SQL databases include:

MySQL
PostgreSQL
MariaDB
Oracle

Non-Relational Database

Conversely, a non-relational database is also known as a NoSQL database, which means it does not use SQL in order to manage and manipulate data. The main difference with a non-relational database is that it does not need to share a primary key to associate tables with one another. In fact, you can use fewer tables to categorize your data and still accomplish the same searches. This is because you’re able to store data in tables without needing to adhere to a relational data structure.

An easy way to remember this is that a NON-Relational database:

does NOT have to RELATE tables
does NOT have to share a primary key
does NOT require SQL (NoSQL) to perform operations

Commonly used NoSQL databases include:

Amazon DynamoDB
MongoDB
Redis
Amazon OpenSearch Service (formerly known as Elasticsearch)

Database Examples

Disclaimer: There are many ways of structuring data, and these examples aren’t all-inclusive. Their purpose is simply to demonstrate how information can be stored in relational vs. non-relational database types.

Now that you have a basic understanding of what databases are, we are ready to dive into the fundamentals of Amazon DynamoDB.

Amazon DynamoDB Fundamentals

We have finally arrived at the main course of this blog where I serve you a wonderful platter of Amazon DynamoDB’s most significant features. I’ll be going over what makes Amazon DynamoDB so special, as well as also focus on the how and why as it applies to your search for the most optimal database.

After you’ve gained this knowledge, I’ll summarize all features that I’ve mentioned into a glorious Amazon DynamoDB example and will walk you all through how I design databases from my own methods there I’ve learned over the course of using Amazon DynamoDB.

What is Amazon DynamoDB?

Amazon DynamoDB is a serverless, non-relational database that specializes in supporting applications that require high scalability while still maintaining millisecond speed executions. You can find all the features of DynamoDB on the official AWS website, but here are the ones that could be most influential when choosing a database:

Fully-Managed & Serverless — minimal efforts spent on maintenance of the actual service
NoSQL & Key-Value — doesn’t require SQL and the key-value (dictionary or hash) paradigm maintains high scalability
Built-In Security, Backup and Restore — ensuring the security and redundancy of your data is included
Flexible Capacity Modes — more control of read/write with on demand and provisioned modes
Data Modeling Tools — bring data model concepts to fruition with tools to help build, test and deploy your tables

Whether you’re building an online shopping platform or managing large volumes of user-generated content, DynamoDB can handle pretty much all kinds of situations.

Why choose Amazon DynamoDB?

In the world of databases, the ultimate crusade is finding the most cost-effective way to store, organize and retrieve information while maintaining flawless performance as the application grows. As a program evolves over time to support new features and capabilities, the total size of its data store can also increase to astronomical amounts. The true test of your infrastructure’s design is seeing how long it can withstand the weight of these changes with the least amount of impact.

Regardless of the database you choose, it will inevitably be pushed to its limits throughout the entire lifespan of the application. At the very least, the system you select should have all the capabilities to sustain the design as it matures. Most of DynamoDB’s key strengths are positioned to overcome these particular challenges and it all starts with a well-designed schema.

What is a Schema?

A schema is simply a blueprint that the database follows to sort and organize your data. It’s how you’re able to communicate with the database so you can manage your data within. Defining the schema can be tricky when you’re dealing with a large volume of data and several, complex, access patterns. It’s especially tricky if you’re working with legacy data managed in a SQL based infrastructure. While Amazon DynamoDB is an incredibly powerful tool, its only real limitation is whether or not the schema has been designed properly.

All database schemas rely on unique identifiers to sort and categorize data. We had previously described identifiers like the coordinates on a map to find a specific location. These types of identifiers can vary across all types of databases, but Amazon DynamoDB utilizes keys and partitions, which we will cover in the next section.

Primary Keys

At the beginning of this blog, we explained how the primary key is the unique identifier used in a SQL Database that establishes the relationship between the data across different tables. A NoSQL database does not follow this requirement, for various different types of data to be stored within the same table, regardless of how the information relates to one another, ultimately simplifying how the data is accessed and managed. However, a unique primary key is still required to maintain item separation within the database. This is done by defining the partition key for your database.

The simplest way to understand how the partition key functions is to imagine a brand new hard drive and creating the first set of folders that will hold all your files. Essentially, you are designating a partition of the disk space in the cloud for where your database and all of its data will live. Depending on your specific use case, this can either be really easy or painfully elaborate, because in DynamoDB there are two types of primary keys that you can use for your table design.

Simple — Partition Key

A simple partition key is when you only need to define a single column to be the unique identifier for your database. This is the most straightforward way to design a schema, but as the name suggests, it should only be used to support the most basic database functionalities and requirements. If you’re working with clearly defined data sets and managing that information is pretty obvious, then a simple primary key can easily be mapped to a single column.

Composite — Partition Key + Sort Key (Range)

In most cases, there are more challenging requirements expected from a database where the vast amounts of information to collect are as intricate and complex as the demands for retrieving them. DynamoDB’s solution for this complexity lies in its second option for assigning primary keys. Instead of using a single column to define your primary key, you’re allowed to use two. This is called a composite key and it’s by far one of my favorite features.

A composite key is when you combine the partition key with a secondary key and together they generate a unique identifier which can be used as the primary key for that item. This secondary key is known as the “Sort” key and it’s used to further categorize a subset of data that belongs to the same partition key. This method is the one I see and use most often for DynamoDB. It is by using the combination of these keys which allows you to achieve the categorizing and storage of completely different data within a single table.

This can be the most difficult concept to grasp when first stepping into DynamoDB, but it’s the most rewarding when you’ve figured out the right combination that works for your particular situation. One thing to keep in mind when designing your schema is if you are using a composite key, it is possible to query with only knowing the partition key, but it is not possible to search with just the sort key. There are solutions for this which we will cover in greater detail later on as well as an in-depth explanation of how to come up with the best composite key.

Secondary Indexes

You may encounter situations where you might need to query for an item using a value that is not a partition key or a sort key. In the world of SQL you would solve this by building a new table to display the desired data which is convenient but costly in terms of process charges. Another exciting feature available in DynamoDB is the ability to assign attributes within your database as additional keys to perform your queries against. When you assign the desired attributes as the new primary keys, this is known as creating a secondary index.

This secondary index gives you the option to query for items within your database using attributes that are not defined or accessible in your original primary key. This generates an alternate view of your database which is achieved by duplicating all the desired information and mapping them to the new index. There are two types of secondary indexes that can be created.

Local Secondary Indexes (LSI) — A local secondary index is used when you need to access data within the same partition but it does not have the same sort key.

Global Secondary Indexes (GSII) — A global secondary index is when you need to access data across the entire table and in different partitions.

Secondary Index Costs

It’s important to clarify that in order for secondary indexes to be possible, you are creating a second copy of the database and all its contents in order to satisfy the additional search requirements. That means everything from the entire size of the database and all the required read/write capacity to manage it is doubled. While you aren’t able to exactly see the second table, it’s there in the background maintaining a consistent copy of data so that the secondary index can be utilized to access the desired information.

Secondary Indexing increases the cost of your overall usage in the DynamoDB because it performs additional reads/writes in order to maintain consistent data across the tables for each secondary index you create.

When creating new tables and schemas, it’s more cost-effective to design a system where a secondary index is not necessary to accomplish your access patterns. However, if that isn’t an option for you because you’re working with either a pre-existing database or legacy data, then you at least have the ability to implement secondary indexes.

Amazon DynamoDB Pricing

I’ll refrain from going into too much detail on pricing since there’s not much more information to provide than what is available through AWS’ own official site, but I will say that Amazon DynamoDB’s free tier is extremely generous and you’ll be able to start your project without any concerns of going over budget. (Free Tier offers 25 GB of storage and up to 200 million read/write requests per month)

The only time you would need to be careful is if you’re dealing with existing databases and legacy data. Depending on how you structure the new infrastructure, you may want to consider the budget before starting to avoid any mistakes later on down the line.

AWS Best Practices

When designing for Amazon DynamoDB and Amazon Web Services, there are some best practices that Amazon advises in order to achieve the best performance in developing your application. You can review them in greater detail on their website, but I’d like to highlight a couple of these points because they’re the catalyst of how I approach designing all my schemas for Amazon DynamoDB:

By contrast, you shouldn’t start designing your schema for DynamoDB until you know the questions it will need to answer. Understanding the business problems and the application use cases up front is essential.
You should maintain as few tables as possible in a DynamoDB application.

Of course, every project is different and there are certainly situations where you would deviate from these guidelines, but in my experience, these are the two most important philosophies that I ground myself in when designing a database schema for DynamoDB.

In the next part of this blog, I’ll delve into my personal design methods for DynamoDB schemas. I’ll walk you through the thought processes and decision-making strategies that have shaped my approach to database design, and hopefully provide some helpful insights for your own projects.

Conclusion

In closing, I hope this blog has shed some light on the complexities and capabilities of Amazon DynamoDB and its role in delivering robust, high-performance solutions. Armed with this knowledge, you are now one step closer to making informed decisions that can bring transformative change to your business. Keep an eye out for our next installment where we delve even deeper into DynamoDB’s world, showcasing how to design the most effective database schemas based on real-world experiences and use-cases. If you’re eager to leverage the power of DynamoDB or other next-gen technologies, why wait? Reach out to us at TensorIoT. We’re ready to work with you, combining our expertise and innovative spirit to engineer solutions that drive your business into the future. Let’s create something extraordinary together!