2. Types of Data Architectures
Database
Database is a collection of structured and organized data, which is usually manipulated through Structured Query Language (SQL) queries. Database is optimized for data accessibility and retrieval, being great at transactional operations.
A well-known example are the relational databases that require a previous schema, meaning they can’t accept neither semi-structured nor unstructured information. For this reason, they are more popular for monolithic architectures.
Data Warehouse
Data warehouses are commonly used to store large amount of structured and treated data for analysis purpose. The data are usually stored in it through an ETL pipeline and one of the difference between them and relational databases is the possibility to store data from a variety of sources.
Data mart
Data mart is a data storage system focused on a specific line of business, department, or subject area allowing a defined group of users to analyze these information more efficiently and to make informed decisions.
Data Lake
Data lakes can not only store structured data, but also semi-structured and unstructured data offering more flexibility for your business solution. In other words, a data lake is a less complex place where it’s possible to store any kind of data. However, they require more elaboration to be implanted otherwise they’d be considered data warehouse. It’s good to mention that a data lake is commonly used by data scientists or data specialized analysts to extract insights from data, while data warehouses are manipulated by business analysts. Besides, data lakes adopt the ELT instead of ETL as data warehouses.
Data Mesh
Data Mesh attempts to invert the challenges of centralized data architecture, taking the concepts of domain-driven design and applying them to data architecture.
It was a first term used by Zhamak Dehgani in 2019. Below a brief description about it:
Data mesh is a decentralized sociotechnical approach to share, access, and manage analytical data in complex and large-scale environments — within or across organizations, as defined in Data Mesh: Delivering Data-Driven Value at Scale.
This architecture effectively unites the disparate data sources and links them together through centrally managed data sharing and governance guidelines.
Learn more about Data Mesh here!
Data Fabric
Data Fabric is an architecture that enables seamless integration between various data pipelines and cloud environments through the use of intelligent and automated systems. It provides an unified and consistent view of data allowing organizations to effectively work with data for analysis.
Learn more about Data Fabric here!
3. References
Reis, J, Housley, M 2022, Fundamentals of Data Engineering: Plan and Build Robust Data Systems, O’Reilly Media.
Databases, Data Lakes, and data warehouses compared: UK (no date) Confluent. Available at: https://www.confluent.io/en-gb/learn/databases-data-lakes-and-data-warehouses-compared/ (Accessed: 17 May 2023).
Data Mesh VS data fabric architectures: What you should know (2023) StreamSets. Available at: https://streamsets.com/blog/data-mesh-vs-data-fabric-architectures/ (Accessed: 12 June 2023).
Fernández, E.Patricia. (2017) Big data, Amazon. Available at: https://aws.amazon.com/big-data/datalakes-and-analytics/what-is-a-data-lake/ (Accessed: 17 May 2023).
Yu, A. (2023) Database vs. Data Lake vs. Data Warehouse: What’s the difference?, Redpanda. Available at: https://redpanda.com/blog/database-data-lake-data-warehouse-differences (Accessed: 17 May 2023).
Taylor, D. (2023) Star schema vs snowflake schema — difference between them, Guru99. Available at: https://www.guru99.com/star-snowflake-data-warehousing.html (Accessed: 12 June 2023).