How to schedule scaling (without writing a single line of code) on Amazon DynamoDB provisioned tables.

Ersoy Pembe
Insider Engineering
5 min readFeb 6, 2023

Amazon DynamoDB is a fully managed, serverless, key-value NoSQL database designed to run high-performance applications at any scale.

DynamoDB helps us to offload the administrative burdens of operating and scaling a distributed database so that we don’t have to worry about hardware provisioning, setup, configuration, replication, software patching, or cluster scaling. DynamoDB’s high availability and durability handle our throughput of ours while maintaining consistent and fast performance.

These are some of the key reasons that we switched from self-managed, on-premise Apache Cassandra to Amazon DynamoDB after making several types of research, benchmarks, and technical designs across these database technologies with some others too.

It has extensive features to help us to develop critical features and components of the Architect such as sessions, user activities, starter users, etc. We have more than 3 terabytes of data across 12 tables and some of the tables are consuming more than 1.5 million Write Capacity Units in a minute.

One write capacity unit represents one write per second for an item up to 1 KB in size.

One of the core configurations of DynamoDB is the capacity mode to determine how you are charged for write and read throughput and how you manage the capacity. There are two modes available; on-demand and provisioned.

  • On-demand offers pay-per-request pricing for reading and writing requests so that you pay only for what you use. It is a good option for unknown workloads and unpredictable application traffic.
  • The provisioned mode requires you to specify the number of reads and writes per second that you require for your application. Additionally, the auto-scaling option is available to adjust the table’s provisioned capacity automatically in response to traffic changes. It is a good option when you have predictable application traffic and the workload is consistent or ramps gradually.

The Auto-scaling option is not only for the on-demand mode but also the provisioned mode that supports it. You define a range of upper and lower limits for both read and write capacity units. You also define a target utilization percentage and reset is managed by AWS. DynamoDB upscales and downscales your target capacity according to the throughput of your application within the given ranges.

With DynamoDB auto scaling, a table or a global secondary index can increase its provisioned read and write capacity to handle sudden increases in traffic, without request throttling. When the workload decreases, DynamoDB auto scaling can decrease the throughput so that you don't pay for unused provisioned capacity.

One of the most important determining factors between these two modes is cost. Depending on the use case, provisioned capacity pricing can be up to 7x cheaper than on-demand capacity. Additionally, you can purchase reserved capacity with a one-time upfront payment(1 year or 3 years term) to save from %50 to %75 your payments which are applied to only provisioned capacity-enabled tables.

We created an activity table to track the activities of the users across all the journeys. The table was created as on-demand to monitor our capacity usage and determine the application's sweet spot. Before switching from on-demand to provisioned mode(for writes mostly) we had upper and lower targets for the table with one exception. The traffic is consistent all day except 00:00 UTC, the throughput on the table was increasing abruptly which causes ProvisionedThroughputExceededException.

Even the applications using this table have both circuit breakers with exponential backoff mechanisms, the throttles are causing delays in the next minutes and which extends the spike of the throughput. If we have an increased capacity for some period of time during the peak minutes, then the extended period of throughput exceptions wouldn’t cause delays in data ingestion.

There are a few options to tackle this problem such as creating an Amazon EventBridge Rule with the “event schedule” option and defining a lambda target. Then within the lambda function, you increase/decrease the capacity of the given table via AWS SDK which requires some amount of code and configuration. Instead, we decided to use one of the new capabilities from Amazon EventBridge called Amazon EventBridge Scheduler. EventBridge rules are limited to 20+ targets whereas EventBridge Scheduler supports more than 270 services and more importantly over 6000 API Actions with AWS SDK targets.

While creating the scheduler, you define schedule patterns such as occurrence(one-time or recurring), schedule type(cron-based or rate-based), and flexible time window or timeframe configurations.

The differentiating part of the scheduler is the target which supports services such as API Gateway, Lambda, Kinesis, DynamoDB, and hundreds of other services.

We selected DynamoDB as the service, then UpdateTable as the API. In the payload, you just paste your required fields and their respective values.

UpdateTable modifies the provisioned throughput settings, global secondary indexes, or DynamoDB Streams settings for a given table.

{
"TableName": "user_activities",
"ProvisionedThroughput": {
"WriteCapacityUnits": 9000,
"ReadCapacityUnits": 200
}
}

Then you are mostly done. Additionally, you can define a retry policy or a dead letter queue where the failed events are to get delivered which helps you to debug your scheduler. Depending on the operation you will do, you may have authorization exceptions, API level limits, permissions, or quota problems, so I strongly recommend you debug your scheduler by defining a DLQ.

If you are interested in cost optimization, please look over How to reduce 40% cost in AWS Lambda without writing a line of code! post by Seza Akgun, Senior Software Engineer at Insider.

--

--