DynamoDB: Query vs Scan, SQL syntax and JOIN tables — Part 1

Amos Shahar
7 min readMay 24, 2019

--

So, you need a robust solution, quick response to thousands of requests per second, you decide of no-SQL DB and the AWS DynamoDB is prince charming: great performance, unlimited scale and fully managed.

But then, as time pass by, you have more and more DynamoDB tables, you have multiple regions, the primary Key (e.g. userId) is duplicated to multiple tables and the application suddenly needed to query data using fields you hadn’t anticipated…

The business starts to ask complicated questions about the data. You need to count, sum, filter, group data and join tables. The app developers start to query too many tables and do calculations and joins within the code or maybe export the data to excel to provide the answers???

mmm… prince charming is not that shining now :-\

In this, three-parts article, I will demonstrate how you can solve these three key issues:

1. Querying DynamoDB using AWS Javascript SDK, Knowing Keys and Indexes, and Query vs. Scan

2. How to use simple SQL syntax to query DynamoDB, and how to connect to it using a standard MySQL client.

3. How to query your DynamoDB as if it was a real relational database (RDBMS):

  • Use Aggregation functions (count, min, max…) using group by
  • JOIN and UNION DynamoDB tables
  • Cross-region queries

Part 1 — Querying DynamoDB using AWS Javascript SDK, Knowing Keys and Indexes, and Query vs. Scan

Before choosing which DynamoDB API to use, you should be well aware of the keys of your table:

◦ What is the primary key and its sort key?

◦ What are the indexes? What is the sort key of each index?

After answering these questions, we need to decide which API to use: Scan or a Query:

Ideally, we would like to use Query as our first option: It is the quickest way to query dynamo and get the results in a fraction of a second. If this isn’t possible, we will use Scan which may take several seconds or even more, depends on the table size.

Scan filtering is simple, you can use any column (key, sort, index, index sort or simple column) and have multiple filters in the same API.

To query, as mentioned above, you need to consider the table keys structure.

Here are the three rules of thumb:

◦ Query can have one or two fields to filter (but not more!)

◦ One of the filter fields must be the primary key or the index key

◦ The second filter field must be the related sort key

Here an example of a table called testTable:

It has a primary key (id1), with an associated sort key (sort1),

and two indexes with their own associated sort keys: id2 with sort2 and id3 with sort3. Id4 is a regular field.

you can see the primary key and it sort key in the “overview” tab in the AWS console.

The indexes and their work keys can be seen in the indexes tab

testTable Indexes

or you can get this information using the dynamoDB SDK’s describeTable API:

it("describe dynamo table", async () => {
let table = "testTable";
let params = {
TableName: table
};
let [err,tableInfo] = await new Promise((resolved) => {
let dynamodb = new AWS.DynamoDB({region: "us-west-1"});
dynamodb.describeTable(params, function (err, data) {
if (err) {
console.log(`ERROR: ${err}`);
}
resolved([err, data]);
});
});
console.log(tableInfo)
});

This is the result:

Let’s see which query filters we can execute on this table:

  • Key (id1)
  • Key (id1) AND sortKey (sort1)
  • Index1 (id2)
  • Index1 (id2) AND index1sort (sort2)
  • Index2 (id3) AND index2sort2 (sort3)

But we can NOT query by:

  • SortKey (sort1) -> no key
  • Index1sort (sort2) -> sort without the index key
  • Key AND index (id1,id2) -> can be by key or index not both
  • Key AND index1sort (id1, sort2) -> key can only be with its related sort
  • Index1 AND index2sort (id2, sort3) -> mixed index and sort
  • Key1 AND sortKey AND index (id1,sort1,id2) -> more than 2 filters

Now, that we chose the right API for our use case, let’s see the implementation using the AWS SDK for JavaScript:

  1. query with key and sort:
// select id1, sort1, id2 from testTable where id1='a' and sort1='a'
// using dynamoDB query , key filter
it("dynamo basic query - key", async () => {
this.AWS = require('aws-sdk');
this.AWS.config.region = 'us-west-1';
let dynamoDB = new AWS.DynamoDB();

let table = "testTable";
let totalRowsCount = 0;
let totalRows = [];
let [rows, keepWork] = [null,true];
let limit = 1000;

// for fields list
let fields = "id1, sort1, id2";

// for the query filter "where"
let filterValues = {
':v1': {S: 'a'},
':v2': {S: 'a'},
}
// key filter(id1) is mandatory, sort1 is optional
let KeyFilterExpression = 'id1 = :v1 and sort1 = :v2'
var params = {
TableName: table,
Limit: limit,
ProjectionExpression: fields,
ExpressionAttributeValues: filterValues,
KeyConditionExpression: KeyFilterExpression
}

while (keepWork) { //loop to get all results (paging)
[rows, keepWork] = await new Promise((resolved) => {
dynamoDB.query(params, (err, data) => {
if (err) {
console.error(err);
throw err;
}
Array.prototype.push.apply(totalRows, data.Items)
keepWork = data.LastEvaluatedKey;
if (keepWork){
params.ExclusiveStartKey = data.LastEvaluatedKey
}
resolved([data,keepWork])
});
})
totalRowsCount+= rows.Items.length
}
console.log(totalRows)
})

2. query with index and sort:

// select id1, sort1, id2 from testTable where id1='a' and sort1='a'
// using dyamoDB query, index filter
it("dynamo basic query - index", async () => {
this.AWS = require('aws-sdk');
this.AWS.config.region = 'us-west-1';
let dynamoDB = new AWS.DynamoDB();

let table = "testTable";
let totalRowsCount = 0;
let totalRows = [];
let [rows, keepWork] = [null,true];
let limit = 1000;

// for fields list
let fields = "id1, sort1, id2";

// for the query filter "where"
let filterValues = {
':v1': {S: 'a'},
':v2': {S: 'a'},
}
// index key filter(id2) is mandatory, sort2 is optional let IndexFilterExpression = 'id2 = :v1 and sort2 = :v2'
var params = {
TableName: table,
Limit: limit,
IndexName: 'id2-sort2-index',
ProjectionExpression: fields,
ExpressionAttributeValues: filterValues,
KeyConditionExpression: IndexFilterExpression
}

while (keepWork) { //loop to get all results (paging)
[rows, keepWork] = await new Promise((resolved) => {
dynamoDB.query(params, (err, data) => {
if (err) {
console.error(err);
throw err;
}
Array.prototype.push.apply(totalRows, data.Items)
keepWork = data.LastEvaluatedKey;
if (keepWork){
params.ExclusiveStartKey = data.LastEvaluatedKey
}
resolved([data,keepWork])
});
})
totalRowsCount+= rows.Items.length
}
console.log(totalRows)
})

3. scan:

// select id1, sort1, id2 from testTable where id1='a' and sort1='a'
// using dynamoDB scan API
it("dynamo basic scan", async () => {
this.AWS = require('aws-sdk');
this.AWS.config.region = 'us-west-1';
let dynamoDB = new AWS.DynamoDB();

let table = "testTable";
let totalRowsCount = 0;
let totalRows = [];
let [rows, keepWork] = [null,true];
let limit = 1000;

// for fields list
let fields = "id1, sort1, id2";

// for the query filter "where"

let filterValues = {
':v1': {S: 'a'},
':v2': {S: 'a'},
}
let FilterExpression = 'id1 = :v1 and sort1 = :v2'

var
params = {
TableName: table,
Limit: limit,
ProjectionExpression: fields,
ExpressionAttributeValues: filterValues,
FilterExpression: FilterExpression
}

while (keepWork) { //loop to get all results (paging)
[rows, keepWork] = await new Promise((resolved) => {
dynamoDB.scan(params, (err, data) => {
if (err) {
console.error(err);
throw err;
}
Array.prototype.push.apply(totalRows, data.Items)
keepWork = data.LastEvaluatedKey;
if (keepWork){
params.ExclusiveStartKey = data.LastEvaluatedKey
}
resolved([data,keepWork])
});
})
totalRowsCount+= rows.Items.length
}
console.log(totalRows)
})

The three long functions shown above are DynamoDB’s cumbersome way of executing the following three simple SQL queries:

  • select id1, sort1, id2 from testTable where id1 = ‘a’ and sort1 = ‘a’
  • select id1, sort1, id2 from testTable where id2 = ‘a’ and sort2 = ‘a’
  • select id1, sort1, id2 from testTable where id4 = ‘a’ and sort1 = ‘a’

Conclusion:

Dynamo is great noSQL solution but it’s keys/indexes logic wouldn’t be as trivial if you could use SQL. The APIs are too long and complicated as well.

My wish list from DynamoDB is the following:

1. Simplify the APIs using SQL syntax

2. Do the best effort (behind the scene!!) to use Query and if there are no other options, return the results using Scan. Automatically do all the keys, indexes, and sort logic for me as well.

3. Allow all relational DB functionalities:

a. Aggregation functions like count, min, max…, group by and order by…

b. Allow JOIN two different DynamoDB tables within and across regions.

c. Allow UNION of query results from different DynamoDB tables and regions

d. Connect to a DynamoDB using a standard MySQL client

Believe it or not, all of this is possible! But this is for Part II and III of this series .

Can’t wait? Want a hint? superQuery is doing this magic!

Stay tuned!

--

--

Amos Shahar

Director of DevOps & Server Development at superQuery