DynamoDB — What you should know…

Published in

Yik Yak Engineering

5 min readJan 10, 2017

DynamoDB has a lot of desirable traits. It is a NoSQL store with the usual horizontally scalable attributes. You provision and pay for the capacity that your use case demands. You can optimize your capacity dollars through the judicious use of the less expensive eventually-consistent-reads versus the more expensive-strongly-consistent reads. Last, but not least, you don’t have to worry about managing your own set of database servers.

These are really nice attributes of a data store, but it’s important to be aware of the gotchas. In this post, we wanted to share some of the potential trip-ups that we encountered along the way.

Provisioning — Where does my money go?

Provisioning is simply determining what set of tables you will need and what are the likely read and write requests per second that your application will need to support at peak load.

A common pattern is to think about the types of information your application manipulates and then provision a table per type of information. For example, it is likely that you need to track users who are known to the application and some other business related information like a product catalogue that these users interact with.

Now, what read/write capacity do each of those tables need at peak load? Think carefully because this is about to dictate a monthly financial commitment. Let’s say that your application can expect about 1000 requests per second coming in through the front door at peak load. If each one of those requests looks up and updates a user and also does a single lookup and update in your business-specific store, then you need to provision two tables each with a capacity of 1000 read and 1000 write units because these are 1:1 with the inbound requests. You just committed to ~$1200 dollars a month for these tables.

In certain circumstances, you may be able to amortize the cost of multiple tables by combining them. However, doing this well can be tricky.

If your peak hours are relatively predictable you can explore dynamically adjusting your table’s capacity in anticipation of the load. AWS will allow you to increase a table’s provisioned capacity as often as you like, but you can only decrease a table’s capacity four times over the course of a UTC day.

If you don’t have enough capacity to deal with your peak load, DynamoDB will start throttling requests once you exceed your provisioned capacity. In some scenarios, like a short-lived request burst here and there, throttling may be fine. However, consistent throttling can degrade your application’s performance when it matters most and should generally be avoided.

Provisioning — Indexes aren’t always free

Once you have tables provisioned, your business logic may dictate queries that can’t be satisfied by DynamoDB’s Primary and Range Key table attributes. In that case, you are faced with the choice of creating a Local Secondary Index (LSI) or a Global Secondary Index (GSI). If you can satisfy your need using a Local Secondary Index, you are in luck — that’s no extra cost. However, if you end up needing to use a Global Secondary index, the monthly costs for your tables just went up because each index needs to have its read and write capacity provisioned as well.

Read Consistency — Beware of Eventually versus Strongly

DynamoDB allows the user to specify the type of Read Consistency to use when reading data from a table. A Strongly Consistent read ensures that the latest data successfully written is what is read back. An Eventually Consistent read simply means that the server may return whatever copy of the data it happens to have at that moment which may not be the latest data successfully written.

So, why have both? For starters, Strongly Consistent reads are twice the cost of an Eventually Consistent read. If your application will work without the latest and greatest copy of the data, then you can save some money when provisioning. Another reason, which is speculation on my part, has to do with how certain features are implemented by DynamoDB itself. For example, reads from Global Secondary Indexes are always eventually consistent, probably because they are updated asynchronously when the main table is written.

The good news is that if your application can get away with possibly stale data, an Eventually Consistent read will save you money and a slew of subtle bugs. However, if your system tries to compute a new system state based on the current system state as recorded in DynamoDB, only Strongly Consistent reads will do and you should never rely on a GSI as part of the computation.

DynamoDB Metrics — Accurate at Minute Resolution… When Seconds Matter

Early on at Yik Yak we built a feature to allow users to share yaks with other users in their contact list. Invariably we ran into some DynamoDB limits and throttling ensued. As part of debugging that issue we relied on the metrics published to Cloud Watch. The curious thing is that our metrics showed that we were well under the capacity that we had provisioned our tables at.

In the midst of my preparation for Harakiri, we dug into the metrics some more. Low and behold, the DynamoDB metrics published are averages over a minute! This can completely misrepresent what is actually happening capacity-wise because the one-minute window size will average out short lived consumption over the provisioned amount. You are better off looking at the max value over the minute window.

Microservices — Good, Right?

This is not an indictment of microservice architecture — it can be used to good effect. However, you need to think about how the read and write patterns of these services affect your DynamoDB table provisioning.

Let’s say that as part of servicing a request we have three microservices in the backend and each one of them wants to update some part of the user record stored in DynamoDB. Nice and modular, but now each request can cause three separate writes to the user record, which effectively adds a 3x multiplier to the per-request capacity consumption. As the number of microservices that comprise the backend increases it is very easy for these types of modularity-driven, consumption multipliers to creep in and elevate the required provisioning requirements.

The lesson here is to try to minimize the number of reads and writes of a record per request.

Conclusion

None of the issues mentioned above are deal breakers. You can write great code using DynamoDB — as long as you are aware of the subtleties and plan accordingly.

If you have a read-intensive use case, you may be able to leverage an in-memory cache like Redis to ease the load off of DynamoDB. However, there are gotchas with this approach as well, which we’ll cover in our next post: Elasticache for Redis — Not a Datastore.

DynamoDB — What you should know…

Written by Miguel Mendez