Member-only story
A Real-Time Streaming Channel
From NoSQL to SQL
In this article, I am going to demonstrate a solution for transforming the data from a staging table (DynamoDB) to a storage place (S3 bucket), and ultimately onto an Athena table for data warehousing management. This approach facilitates us as data warehouse developers to manage data in a near real-time manner.
Background
In my day-to-day work routine, I’ve constantly experienced some situations such as:
- the attributes of a data record don’t come altogether all the time, it takes time for an isolated data record to be fully collected;
- each data record has different schemas because they usually come to us in a JSON format;
- despite the 2 challenges above, we still have to store all of them into one single table.
From a perspective of an engineer, the first issue would be interpreted as handling slowing changing dimensions of a data warehouse; the second one requires a balance between relational databases and non-relational databases, this is because what we want at the end is a data warehouse for online analytical processing (OLAP), hence a structured, flattened, column-oriented table is preferably more desirable when it comes to aggregation. To deal with those, we are going to leverage the compatibility…