DynamoDB/NoSQL — One practice that can save you in the future

Almir Mustafic
3 min readJan 19, 2023

--

DynamoDB — try this approach

A general practice is to have a create and an update date attribute in every DynamoDB item/record and these attributes would be recorded in UTC time format.

Since the UTC format goes down to the microseconds precision, it does not really give you any value if you want to create an index around this UTC format attribute.

There are a few attributes that could be a standard, and they can lay the foundation that can help you in the future:

  • yearMonthDt (eg “202301”)
  • yearMonthDayDt (eg. “20230118”)
  • yearMonthDayHourDt (eg. “2023011813”)

How can this DynamoDB approach help you in the future? Here are some examples:

(1) Near-time processing: Imagine if you wanted to build a near-time system (some batch or Lambda) attached to your microservice to analyze the data in near-time and and check the integrity of data and correct the data. Now you can create GSIs on this table where for example the hash key of the GSI could be yearMonthDayDt. Then instead of scanning DynamoDB table (which is not good), you can actually perform a query on this GSI using yearMonthDayDt hash key and then traverse through the resultset and process these records. The crucial part is that you would be doing a query instead of a scan and you are narrowing the query to a specific day keeping the resultset under control.

(2) Aggregate records: Let’s say you wanted to create an aggregate view of what your customers have been doing in the current hour and the service and table that holds this data is a DynamoDB table. Now you can have yearMonthDayHourDt attribute as a hash key in the new GSI and a Lambda could be using a search on this GSI to consistently in near-time produce an aggregate record and persist it into a table; you can technically persist the aggregate record into the current table as long as you can distinguish it from the regular records. Now your real-time application code has an ability to perform a fast read operation on what other customers are doing within this hour. One example is those travel/hotel websites that tell you “10 other people are viewing this hotel room and similar rooms”. Yes, there are many other ways to solve this, but I am just trying to give you an example here.

(3) Data Lake: Assuming you have DynamoDB streams set up on your tables and that your data is flowing into some data lake, these attributes can also help data engineers. Data engineers can use these attributes in their distributed processing (i.e. Apache Spark or similar) to process the data slightly efficiently.

Let’s talk a bit about the pros and cons. The pro is that you are doing this near-time processing closest to the microservice that owns the actual data and the processing would be as near-time as possible. A con is that these GSIs will cost you extra because it is technically a replica of your original table indexed differently. However, you can first start saving these attributes to your table without extra cost and you don’t necessarily need to create a GSI at the beginning. In the future if you need to solve problems similar to examples listed above, then you have a choice.

What is an alternative? If you have all the data streaming from DynamoDB tables into a data lake and some form a reporting platform that generally would have a relational table structure with data in a more normalized format. Now you can you can perform some analysis on this data and extract the results into a file. Then your microservice (or a component of your microservice ecosystem) needs to fetch this file and process this data accordingly to solve what I described in the above bullet point 1 and 2. The pro of this solution is that you are not paying for the extra GSIs. The con is that you are not processing this data in near-time.

In conclusion:

If your microservices use NoSQL tables (i.e. DynamoDB), you need to consider the above options and decide which solution best fits within your platform.

Thank you for reading this article. Keep geeking out!

Almir Mustafic

--

--

Almir Mustafic

Director of Software Engineering, Solutions Architect, Car Enthusiast. Opinions are my own. (http://AlmirMustafic.com)