Data Architectures

Antonio Soto
3 min readJan 24, 2022

--

After making clear what is the importance of data in the business ecosystem of our times and the different levels of maturity that we must take into account to be able to really extract value from the data projects we carry out, today I want to talk about what are the options, at the level of architectures, that we have at our disposal to be able to implement the different stages discussed in the previous post.

To focus the conversation, I consider it is necessary to define some concepts that we will use throughout the article and that will also help to understand the differences between some of the options that we will analyze. The first step that we must always perform will obviously be to connect to the source and read the data we need for our analysis. This is the first phase of the process known as ETL, initials of Extraction, Transformation and Load, that is, we extract the data from a source, perform the necessary transformations and load them at the destination. This is the first phase of the process known as ETL, initials of Extraction, Transformation and Load, that is, we extract the data from a source, perform the necessary transformations and load them at the destination. This is, historically, the process used to load our Data Warehouses, or Data Marts. The concept here to understand is that we need to know, on the one hand, the transformations to be carried out, and on the other, the scheme or model that the destination must have.

This is not always possible, or convenient, especially if our source data is not structured, or we are not clear about the target model, because the analyses we need to perform are not clear. In these scenarios, we “simply” change the order of the acronyms to implement an ELT process, that is, We extract, load at the destination, and leave the transformation phase, for a later moment and process. This pattern is widely used to load source data into our Datalakes.

With these concepts, I think we can already begin to understand the big difference between a Data Lake Architecture, compared to another Data Warehouse, and that they are not comparable, but complementary, and it is increasingly common to see both in current data architectures. Traditional Data Warehouses are architectures based on organizing our data, with a vision of analysis, using dimensional model concepts, which are the truly relevant concept in this scenario. On the other hand, in a Data Lake, the data, as we mentioned earlier, is loaded without modeling, which facilitates more varied types of analysis in a simpler way. But I commented earlier, today they are complementary options, in what has been called the Data Lakehouse, incorporating, above the Datalake layer, a governance layer, in which we can implement ETL processes, or modeling layers that allow us to consult that data from our analytical solutions.

https://medium.com/@antoniosql/membership

--

--

Antonio Soto

After more than 20 years managing information systems, mainly in Microsoft environments, with special focus on Business Intelligence systems and data management