Sitemap
Plumbers Of Data Science

The Data Engineering Community, we publish your Data Engineering stories

Member-only story

A Real-Time Streaming Channel

10 min readJul 29, 2023

--

In this article, I am going to demonstrate a solution for transforming the data from a staging table (DynamoDB) to a storage place (S3 bucket), and ultimately onto an Athena table for data warehousing management. This approach facilitates us as data warehouse developers to manage data in a near real-time manner.

Image by Author

Background

In my day-to-day work routine, I’ve constantly experienced some situations such as:

  1. the attributes of a data record don’t come altogether all the time, it takes time for an isolated data record to be fully collected;
  2. each data record has different schemas because they usually come to us in a JSON format;
  3. despite the 2 challenges above, we still have to store all of them into one single table.

From a perspective of an engineer, the first issue would be interpreted as handling slowing changing dimensions of a data warehouse; the second one requires a balance between relational databases and non-relational databases, this is because what we want at the end is a data warehouse for online analytical processing (OLAP), hence a structured, flattened, column-oriented table is preferably more desirable when it comes to aggregation. To deal with those, we are going to leverage the compatibility…

--

--

Plumbers Of Data Science
Plumbers Of Data Science

Published in Plumbers Of Data Science

The Data Engineering Community, we publish your Data Engineering stories

Memphis Meng
Memphis Meng

Written by Memphis Meng

I write data, sports and more.

No responses yet