How to Build a High-Throughput System With DynamoDB

How to Utilize DynamoDB Correctly When You’ve Exclusively Worked with Relational Databases

Kim Gault
Kim Gault
Nov 4, 2020 · 8 min read
Image for post
Image for post
A visual interpretation of a database in the 1995 film, Hackers. How far we’ve come!

Relational databases have long dominated the field of software engineering, but the adoption of NoSQL databases has been rapidly catching up. NoSQL (also known as “Not Only SQL”) databases emerged as a solution to the growing limitations of relational databases and store data differently than relational databases. In 2012, Amazon unveiled its managed database service, DynamoDB, to the public, joining popular databases of the time such as MongoDB and Apache Cassandra in the category of NoSQL databases. This article will focus specifically on DynamoDB.

Why Choose DynamoDB

In most growth situations, relational databases tend to rely on scaling vertically by adding more resources to the server while NoSQL databases are meant to be scaled horizontally by adding more instances of the database. Scaling vertically is usually more expensive and can cause more disruption in production.

The upside of a relational database is flexibility in creating data queries, but it is expensive. Relational database queries are slower and can’t handle high-throughput as well as NoSQL queries.

This performance and horizontal scalability comes at a cost to reliability. If you are looking for a database with ACID Compliancy (Atomicity, Consistency, Isolation, Durability), a relational database is the better choice for a database. If your data is low volume and have a consistent pattern, a relational database is also the better choice. Essentially, when prioritizing scalability, performance or schema freedom, NoSQL is the stronger contender when choosing between relational and NoSQL.

Of the many NoSQL databases to choose from, there are a few factors that make DynamoDB preferable to the rest:

Fully Managed by Amazon

DynamoDB differs from some of the other NoSQL databases by being fully-managed by Amazon. This means servers are provisioned by Amazon. Availability and tolerance are built into DynamoDB by Amazon. It also has the unique benefit of having one of the only pay-per-use models for NoSQL databases. On the other hand, it can also be a black box. This may not be a good option depending on how much low-level control a company would like for its database.

Plays Well in Serverless Ecosystem

Another driving force for choosing DynamoDB is the adoption of AWS Lambda. When considering databases in the serverless ecosystem, there are only two choices: DynamoDB and Aurora. Amazon released Aurora Serverless shortly after DynamoDB as AWS’s relational serverless database. An AWS serverless database, such as DynamoDB and Aurora, and AWS Lambda work better together than with a database outside of the serverless ecosystem.

The serverless ecosystem is about hyper-ephemeral computing. As instances are created and destroyed with each call, it can be difficult and costly to have a consistent TCP connection with a traditional database server.

Credential management is very straight-forward when all products are Amazon severless products. DynamoDB uses AWS IAM for authentication and access will be granted to AWS Lambdas by an IAM policy. If it were a MySQL database on its own instance, each lambda would have to retrieve the database credentials to connect every time. Additionally, when the database is not available to the public internet, the lambda will have to be behind the same VPC as the database. This results in prolonged cold starts.

Better Scalability and Reliability

Although DynamoDB is the perfect accompanying tool for AWS Lambda, DynamoDB should be considered regardless if the application is in a serverless ecosystem or not. First and foremost, NoSQL databases were created to out scale a relational database.

When considering databases for reliability and scalability, DynamoDB and Cassandra are favored contenders. DynamoDB and Cassandra both support a multi-primary model, while NoSQL databases like MongoDB support a single primary model. Databases with multi-primary models scale better and are more resilient than their peers. Write access is limited only to primary servers. Since MongoDB follows the single primary node model, write access is limited. Deploying more shards will increase write capability, but it will always be smaller in comparison than to a multi-primary model.

DynamoDB and Cassandra also handle failure and replication better. When a primary table goes down, one of the replicas is selected as a primary. This process results in downtime for write access. With a multi-primary table, this isn’t an issue.

How to Setup DynamoDB for Success

Many companies choose DynamoDB for its effectiveness as a high-throughput database, but then fail to set it properly up for success. DynamoDB can query data more efficiently than a relational database, but only in the way it was intended to query. Outside of those intended ways, DynamoDB is burdensome in both cost and time.

Breaking Away from the Relational Mindset

When designing a relational table, normalization is one of the most important aspects of designing a database. The table is completely designed and populated before you even think about access patterns. Access patterns aren’t a consideration until building the query. DynamoDB is the opposite. In DynamoDB, the design of the database is based on access patterns. The hardest part of designing a DynamoDB database is forcing developers to break away from the relational and more intuitive mindset when designing the database.

Generally speaking, most applications will be able to design a database where there is the same activity across all partition keys in the table and its secondary indexes, and most applications should be able to store all data per system using a one table design. Accomplishing this design pattern will require creative and non-traditional planning prior to building the database.

A great method is to keep attribute names agnostic so that partition and sort keys can be reused for multiple types of records. Utilizing secondary indexes is another way to leverage multiple access patterns in one table. Another important key difference is data redundancy. Relational database developers have been taught to avoid redundancy in relational table design. Redundancy is allowed and encouraged for DynamoDB as it can help fulfill the required access patterns needed by the business.

A good (albeit not quite official) indicator for success when designing for a DynamoDB table is when the table becomes no longer as readable as a relational table. For instance, to maintain a sort key that encompasses the access pattern of multiple type of records, a common pattern used for sort key attributes is “{Type}#{attribute-specific sort data}”. Records can now be retrieved that match a type and either part or all of the specific sort data. It’s a flexible yet unintuitive way to create database design and will leave the DynamoDB table in a very unreadable state, especially if the reader is used to the almost Excel-like readability of a relational table. Key attributes will also seem foreign, consisting of hashes and other data purely used for sorting.

Image for post
Image for post
An example of DynamoDB records from Amazon.

Most business requirements can be accomplished by querying or doing a get by primary key if enough consideration is placed into access patterns when designing the database records. If the application is doing a scan in DynamoDB for a task other than migrations, something is wrong.

For those looking to adopt DynamoDB, Amazon offers a great article for best practices for many-to-many relationships here. To read more about best practices for designing and using partition keys effectively, click here.

The philosophies behind a relational table design and a DynamoDB design are completely different. Data structures should be tailored to the specific requirements of use cases with DynamoDB, whereas with a relational database, query optimization has nothing to do with schema design. Almost all of the work for DynamoDB is in planning the data model up front.

Build Out the Application Layer

There are multiple possible drawbacks to designing a table based on access patterns rather than relationships.

  1. Bad data could be written because there is no schema enforced on the table level.
  2. A one table design leads to unreadability.
  3. Adding an access pattern or a data migration after the database is written might be costly and time-consuming.

These drawbacks will be reduced in the application code. Validation and readability is in the table layer for a relational table. For the DynamoDB table, the validation is in the application layer. It’s very important to code efficiently and with scalability in mind when coding the data models and database layer so that everything can be optimized.

For example, classes can be created for each entity type. Each class type will know how to transform the entity object into a DynamoDB record, as well as validate itself. It can also map the response it gets when retrieving a DynamoDB record into a readable object for the application. A lot is happening in the application code before it gets to DynamoDB.

Configuring unplanned access patterns or data migration is hairy regardless of the type of database, but it can be more difficult with DynamoDB. It’s why the planning phase of DynamoDB is so important. Considering all access patterns and keeping the partition and sort key agnostic so those keys can be reusable plays a big part in the success of DynamoDB.

When an application’s database and data model layers are built using guidelines that promote modular-based design, such as SOLID principles or orthogonal design, the application will be able to leverage concurrency to make processes, such as adding another index or a migration (sped up by batch transactions), run quickly and efficiently.

Switching Modes for Extremely High-Throughput Cases

A little trick picked up when working with a greenfield system that is expected to have high-throughput immediately is to first provision the read/write and then switch to On Demand.

Although DynamoDB is great when it comes to scalability when in On Demand mode, there is a known delay when the through-put doubles twice within thirty minutes. Knowing this might be an issue when the database went into production initially, the database was provisioned to have read and write capacity for one million and auto scaling was set up. After the initial database went live and the spike in users was less volatile, capacity was switched to On Demand. When a table is updated from provisioned to On Demand mode, read and write throughput expected for the application to perform doesn’t need to be specified. Most databases would be fine with provisioned reads/writes and setting up auto scaling. Since this system was expected to have unpredictable burst speeds, some as high as twenty-five million per minute, it was safer to switch to On Demand for these sudden bursts of high-throughput.

Amazon DynamoDB Accelerator Cluster

Amazon DynamoDB Accelerator (DAX) is an in-memory cache for DynamoDB that is fully managed by Amazon. Amazon manages the servers, the scaling, and cache invalidation. Having a DAX cluster reduces the latency of a few milliseconds down to microseconds. It’s another nice tool in Amazon’s managed wheelhouse.

Conclusion

NoSQL is a great choice for a database when performance, high-throughput and velocity are priorities for an application. Although DynamoDB was built as a solution for companies that have out scaled their relational databases, it can have poor performance when not designed and utilized correctly. Designing and coding for DynamoDB databases requires breaking away from the traditional relational database mindset. When the DynamoDB paradigm is adopted, DynamoDB can be one of the best choices for a database solution.

The Startup

Medium's largest active publication, followed by +752K people. Follow to join our community.

Kim Gault

Written by

Kim Gault

Kim Gault is a backend software engineer in Portland, OR.

The Startup

Medium's largest active publication, followed by +752K people. Follow to join our community.

Kim Gault

Written by

Kim Gault

Kim Gault is a backend software engineer in Portland, OR.

The Startup

Medium's largest active publication, followed by +752K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store