Getting to Know the Basics of AWS DynamoDB

Published in

Fortum Technology Blog

6 min readNov 11, 2022

What is DynamoDB?

DynamoDB is a distributed NoSQL database and is provided as a part of AWS service offering. The term NoSQL has broad definition but in general it means non-SQL or non-relational.

Relational databases, typically associated with SQL, have been known for a long time and inevitably affected how we think about databases. AWS DynamoDB, and NoSQL, is a paradigm shift compared to relational databases. Therefore, it is important to know about the strengths and weaknesses of this type of solution before using it in a production environment.

Recap SQL schema design

In regular SQL solutions, schema is based on tabular structures and “set theory”. In those, a database has one or multiple tables. In a table each column holds a fixed data type and each row means one record.

Example schema definition:

CREATE TABLE Customers( 
   userId INTEGER PRIMARY KEY, 
   name VARCHAR(25) NOT NULL, 
   surname VARCHAR(25) NOT NULL, 
   address VARCHAR(255) NOT NULL 
);

This method works well for one-to-one mappings. If we needed to store a list of user’s orders, we would do normalization and create another table such as:

CREATE TABLE Orders( 

orderId INTEGER PRIMARY KEY, 

userId VARCHAR(25) NOT NULL, 

FOREIGN KEY(userId) REFERENCES Customers(userId) 

);

This setup would allow us to query user and orders data by joining tables using a “JOIN” clause. This type of data and query modelling will work for every SQL database.

Comparison with DynamoDB

The same use case would look a bit differently with NoSQL. A possible DynamoDB schema to fulfill this challenge could look like the following table:

{ 

   "TableName": "Orders", 

   "AttributeDefinitions": [ 

      {  

         "AttributeName": "orderId", 

         "AttributeType": "S" 

      } 

   ], 

   "KeySchema": [  

      {  

         "AttributeName": "orderId", 

         "KeyType": "HASH" 

      } 

   ] 

}

It looks quite different from a relational database schema and, at first glance, might look a bit cryptic. Let’s explore what makes this schema so different.

Record structure

As mentioned before, AWS DynamoDB is a NoSQL database, and more specifically a Key-value database.

Key-value databases store information, known as documents, in the form of hash maps. Similarly, to the dictionaries concept used in programming languages, to access a certain document, we must supply a given key.

In DynamoDB, a “key” consists of a Partition (hash) key and an optional Sort (range) key. Partition key and Sort key together form a primary key. Similarly to traditional SQL a primary key also uniquely identifies a single record.

On the other hand, the “value” (document) part isn’t defined in the schema. This is the first notable difference between DynamoDB and SQL.

Scalar data types

Both Partition and Sort key can be of type String, Number or Binary. DynamoDB convention states that each type is represented as a shortcut; hence we have S, N and B respectively.

Notice that data types are simplified compared to relational databases, where we often must deal with a lot of varying types for integers (int, float, double), strings (varchar, text, tinytext) and binary formats (longblob, mediumblob).

Complex data types

Additionally, documents can hold JSON-like structures: List (L), Map (M), Set (SS). Documents are not part of the schema but can shape it, to handle complex and advanced use cases.

Understanding the Fundamentals for Querying Data

In relational databases we can make queries using SQL on each column available in the table. In the case of AWS DynamoDB, SQL approach does not apply and understanding how to query and retrieve data requires some new learnings. The AWS UI Console is a great starting point to gain this knowledge and can be quite enlightening to first time DynamoDB users.

Scan

The Scan method returns all the records from the table. There is a natural resemblance to the “full scan” method known from SQL. This association is a fitting design choice that makes users aware that, much like in a relational database, the scan method has a negative performance impact and shouldn’t be used lightly.

Partition key

If the table schema has no Sort key defined, the Partition key will become the primary key, resulting in only one record returned by the query.

Sort key

Sort key allows us to filter records using range operators. As an example, if we want to store monthly reports of employees, a single record could look like this:

PK=”employee-123”, SK=”2022–07”

Using this schema, we can provide Partition key and Sort key to extract a single record. One can include a Sort key filter to return records for a single year or not using the Sort key to retrieve all employee reports.

Filters

Filters allow us to narrow down the returned records using fields other than components of the primary key. In SQL, one could use the concept of projections for similar results.

Index query

To accelerate data retrieval, one can query a custom index instead of reaching the table directly. While there are some tradeoffs to consider, this method provides an impressive performance boot to retrieve data.

Design schema

Tabular schema

In relational databases, table design is very structured. We create a table with columns that match exactly the parameters we want to store.

Since AWS DynamoDB is a key-value storage it does not have static schema. This can give the wrong impression that schema design is not important (or even possible) in NoSQL databases.

In an earlier section we learned about querying, which relies on Partition and Sort key. Therefore, the most important questions about table schema are:
- What value should be used as Partition key?
- Do I need Sort key capabilities? What values should be used as a Sort key?
- Do I need indexes?

Even if we think of a classic tabular schema design, that does not help us to answer these questions. Therefore, it is important to reflect on these questions before taking DynamoDB into real use. Is there a better way to design and plan a DynamoDB schema?

Instead of thinking of a classic schema design, a better practice is to model our data similarly to files in a directory. We will treat our database as a root directory, partition key as subdirectory and each document as a file.

This metaphor makes it easy to grasp the concept of querying using Partition key. One needs to know the file path to be able to read the content inside it.

When you think about content, DynamoDB records can hold JSON-like structures hence we can give our files “*.json” extension. That example would give us simple models such as:

/Orders/user-3451/record.json

This approach makes it easier to design and plan a basic schema that would address our user and orders data use case needs.

Conclusion

In this article, we highlighted the major differences between classic SQL tables and AWS DynamoDB, plus covered basics of data modeling and querying capabilities

Moreover, we used directory-oriented modelling analogy to make it easier to plan and design a basic AWS DynamoDB schema. In the next article we will expand this concept to involve Sort key, indexes and other advanced functionalities.

About the Author:

Krzysztof Kurczewski, Senior Software Engineer at Fortum