Avoid Duplication in DynamoDB — A Simple Design Pattern Using Lambda and Randomness

Dinh-Cuong DUONG
Problem Solving Blog
4 min readOct 15, 2020

Data deduplication is a technique to solve the common problem in distributed software architecture whereas a specific data record is duplicated unintentionally in a distributed, non-locking table like DynamoDB.

The stateless function as a service is the greatest architecture toward the future of the software development industry. Well-architectured it right is not always easy. You may have trouble at any time when applying to real-world business. But don’t rush to be dive into your problem, finding a smart, simplest solution giving you a greater award for your mind.

What is data duplication?

“Create an Order” is a famous simulation of a data duplication issue in many software. Given you have a table in DynamoDB to record order from your e-commerce website:

Figure 1 — Data sample of Orders table per one user “John Smith”.

Whereas a buyer can have many orders and each order is represented for a unique shopping cart contains a list of different items between them.

Figure 2 — Data sample of Order Duplication in OrderProducts table.

OrderID is an auto-generated ID so that, in every “createOrder” request, a new auto-increment or GUID is generated. Why a duplication can happen in a serverless architecture model or stateless software architecture?

In an intended or unintended behavior from clients, a duplicated request will derive a data duplication easily.

Figure 3 — The “Error” state will never happen due to the time (T2) to check “If not existed” always true.

This figure demonstrates a scenario when a user accidentally triggers two actions at a very closer time. That could be a double-click on the “Purchase” button or an attacker trying to get a double package with a single payment.

How to avoid data duplication?

It’s hard to say there is somewhere having the best way to deal with this problem. In a large-scale application design, keeping your solution simplest and 100% stateless is a key to success. In this article, we will implement a simple pattern that decreasing the possibility of collision-check “If not existed” between two actions.

Figure 4 — A stateless design pattern to avoid data duplication using Lambda.

If we can shift randomly the ΔT value with an expected condition ΔT - ΔT’> “time to Order-A presented in database”. Given the “createOrder-A” process took 100ms to finish, the delaying time ΔT - ΔT’ > 100ms is the expected value between two collision actions.

How to generate high entropy randomness?

In JavaScript, generating ΔT using Math.random() or crypto.randomInt() function doesn’t deliver a high entropy value due to it depends on a common clock source and seeds from the same Lambda underlying runtime.

But, WHAT IF you can provide a “pseudo” random seed value so that every request will have a different seed? If we can find this source, using algorithmic is your friend to write your own truthness random function.

Here is the full source code of generating a randomness function:

Figure 5 — A pseudo-random generator using JavaScript for createOrder delaying ΔT.

Trickly how to checkIfExisted([“1A”, “1B”]) runs efficiency? Hashing the product list SHA256([“1A”, “1B”]) = ProductHash and store this value along with every order in the Orders table is a good strategy.

With simplify-cli, you can easily be deploying a Lambda function that works as a common random source for every application. The Lambda context.awsRequestId is picking up as the random seed for every request. If this value is not unique, there will be something the AWS team must fix :)

npm i -g simplify-cli
simplify-cli init --template Randomness
simplify-cli deploy --stack Randomness

After deploying the Randomness stack, finding that function name in the .simplify/StackConfig.json folder:

{
"Randomness": {
"LastUpdate": 1602752389713,
"Type": "CF-Stack",
"FunctionName": "example-Randomness-demo-LambdaFunction-1SZ795FL819YT",
}
}

Changing your existing createOrder Lambda function by invoking the randomness lambda example-Randomness-demo-LambdaFunction-1SZ795FL819YT, it will delayed return randomly seeded by each Lambda context.awsRequestId.

For more information about Randomness, visit this article “RNG — The Secret of Cryptography” published in Problem Solving Blog.

--

--

Dinh-Cuong DUONG
Problem Solving Blog

(MSc) Cloud Security | Innovator | Creator | FinTech CTO | Senior Architect.