From Design to Deployment: Data Warehousing with Azure Synapse Analytics (Part 1: Architecture)

Tahnik Ahmed
5 min readApr 21, 2024

--

Simulation Scenario:

Businesses need to consider their current, ideal, and future state of data infrastructure when modernizing their data warehouses. This includes determining a central location for data storage, analysis, and processing, as well as building a scalable infrastructure that can handle emerging technologies like machine learning and AI. To achieve data maturity, businesses should prioritize investment in data infrastructure from the early stages, ensuring compliance with data protection regulations and implementing security measures. This can help avoid costly re-architecting in the future. The hybrid approach to modernizing data warehouses offers a balance between cost efficiency, ease of maintenance, and future extensibility, allowing businesses to make informed decisions about their data infrastructure and analytical infrastructure based on their specific needs and goals.

In this series of articles, I have tried to simulate query load, optimize performance, and offer practical guidance for building relational data warehouses with Azure Synapse Analytics, specifically tailored for retail companies.

Series of Articles on “From Design to Deployment: Data Warehousing with Azure Synapse Analytics”

The articles provide a comprehensive set of tools, best practices, and insights to enable retail organizations to replicate real-world scenarios, fine-tune performance parameters, and implement efficient data warehouse architectures that cater to their unique needs. Additionally, it furnishes detailed documentation, tutorials, and troubleshooting tips that are specific to the retail industry, ensuring that users can maximize the capabilities of Azure Synapse Analytics for their data warehousing needs in a retail context.

Some Context Before Jumping In:

Data warehouses are prerequisites for all analytics and automation operation within an organization. Data warehouse as a repository of the all the necessary and contextual data drives the organizational growth towards data maturity [1]. And, without a solid foundation of the data systems, the organization either will fail to meet the internal data needs or fail to build data products that scale or fulfill any need [2].

The following article provides a good surface level idea of a data warehouse.

Data Warehouse Foundational Architecture [3]

So, how to build data warehouses for small and medium companies (or, large enterprises, why not?)? Here is a proposed workflow to build a data warehouse,

Ingesting data from various sources such as APIs or databases and “sinking” them into a data lake storage.

Querying the files using a serverless (or, dedicated) SQL pool, which supports CSV, JSON, and Parquet formats.

Creating an abstraction layer on top of the data lake storage, known as an “EXTERNAL TABLE,” to access the data like a relational table.

This abstraction simplifies the data ingestion process for data consumers (i.e. data analysts, data scientists or other applications), as it allows them to query the data as if it were in a relational database, even though it is stored in the data lake storage.

Proposed Dataflow:

You can observe the proposed architecture for the data pipeline in the very beginning of this article. The proposed architecture is the following,

Data Warehouse Architecture for Small and Medium Businesses (Diagram: Author)

The diagram is discussed in detail below,

  1. Data is ingested from legacy data warehouses or other cloud data warehouses, SQL servers and migrated or partially refactored legacy databases and SSIS packages are orchestrated into Azure SQL Database through Azure Synapse Analytics pipelines. [4] [5]
  2. Unstructured, semi-structured, and structured data is passed into Azure Data Lake Storage for centralized storage and analysis with other sources. [6]
  3. Real-time data from streaming sources enters the system via Azure Event Hubs and is analyzed immediately using Azure Stream Analytics. [7]
  4. Data is also passed into the centralized Data Lake for further analysis, storage, and reporting.
  5. Serverless analysis tools are used in the Azure Synapse Analytics workspace for ad hoc data science explorations, early prototyping for data warehouse entities, and defining views for Power BI. For the reason of simulating the large-scale data transformation and processing operations, a dedicated SQL pool has been instantiated as you can observe in the architecture diagram. [8]
  6. Azure Synapse Analytics is tightly integrated with potential consumers of fused datasets, such as Azure Machine Learning, Power Apps, Azure Logic Apps, Azure Functions apps, and Azure App Service web apps or other web applications. [9]
  7. Microsoft Entra Managed Identities ensure the security and role management within the whole pipeline.

So, how to design and build such data warehouses in Azure?

The second part of this blogpost discusses about the designing, data modeling, warehousing processes and they are implemented thoroughly with Azure data services.

Please follow the next part of this article : From Design to Deployment: Data Warehousing with Azure Synapse Analytics (Part Two: Design and Creation)”, where I attempted to simulate the process of designing and building the data model for a data warehouse on Azure for a retail company.

Thumbnail: “From Design to Deployment: Data Warehousing with Azure Synapse Analytics (Part Two: Designing and Creating the Data)”

To read more stories and update,

Let’s get connected at LinkedIn,

To get (almost) real time updates about my works, follow me at Twitter/X

All the very best for your endeavors!

References:

  1. Chapter 1, Reis, J., & Housley, M. (2022). Fundamentals of Data Engineering. In Fundamentals of Data Engineering. O’Reilly Media, Inc.
  2. Building a Solid Data Foundation — Metric Labs
  3. ‘Data Warehouse’, Data warehouse — Wikipedia
  4. Azure-SSIS Integration Runtime now available in Azure Synapse Analytics (Public Preview)
  5. Migrate SQL Server to Azure SQL Database (offline) — Azure Database Migration Service | Microsoft Learn
  6. Azure Data Lake Storage Gen2 Introduction — Azure Storage | Microsoft Learn
  7. Introduction to Azure Stream Analytics — Azure Stream Analytics | Microsoft Learn
  8. Serverless SQL pool — Azure Synapse Analytics | Microsoft Learn
  9. Get started analyze data with dedicated SQL pools — Azure Synapse Analytics | Microsoft Learn
  10. Azure Machine Learning documentation | Microsoft Learn

Additional Resources:

  1. Enterprise data warehouse — Azure Solution Ideas | Microsoft Learn;(https://learn.microsoft.com/en-us/azure/architecture/solution-ideas/articles/enterprise-data-warehouse)
  2. Modern data warehouse for small and medium business — Azure Architecture Center | Microsoft Learn; (https://learn.microsoft.com/en-us/azure/architecture/example-scenario/data/small-medium-data-warehouse)
  3. Data warehousing and analytics — Azure Architecture Center | Microsoft Learn; (https://learn.microsoft.com/en-us/azure/architecture/example-scenario/data/data-warehouse)

Appendix:

Thumbnail: “From Design to Deployment: Data Warehousing with Azure Synapse Analytics (Part 1: Architecture)”

--

--