What is Azure Synapse ?

Published in

Microsoft Azure

4 min readAug 20, 2020

Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either server-less or provisioned resources at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate BI and machine learning needs.

Architecture and Design of a data warehouse depends on many factors. A dynamic and well performing DW should guaranty the data validity, concurrency, low latency and capability to integrate with other systems.

Azure Synapse provides you the platform to build and mange a modern DW with limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either server less on-demand or provisioned resources at scale.

Recently I had to work on a Synapse project and I collected the following references and now I’m sharing it with you.

Cheatsheet

https://docs.microsoft.com/en-us/azure/sql-data-warehouse/cheat-sheet

Best Practices

Best practice and guide on Loading Data

Designing Tables

https://docs.microsoft.com/en-us/azure/sql-data-warehouse/sql-data-warehouse-tables-overview

Indexing Tables

https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/sql-data-warehouse-tables-index

Heaps

https://docs.microsoft.com/en-us/sql/relational-databases/indexes/heaps-tables-without-clustered-indexes?view=sql-server-ver15

Clustered Columnstore Index

Designing Distributed Tables — Round Robin vs Hash vs Replicated

Hash-Distribution — Use this distribution methodology for large fact tables with clustered columnstore index.

Replicated — Use this distribution strategy for smaller tables (<2GB), which can be replicated to each compute node, typically used for Dimension Tables.

Round-Robin — Use this strategy for staging tables or for loading data or when there is no clear choice for distribution.

Load data from external tables using polybase

https://docs.microsoft.com/en-us/azure/sql-data-warehouse/load-data-from-azure-blob-storage-using-polybase#load-the-data-into-your-data-warehouse

Azure Synapse SQL Extension

The Azure Synapse SQL Extension contain a collection of User Defined Functions (UDFs) and Views that extend the capabilities of Azure Synapse SQL. This is especially useful when you’re migrating from legacy on premise systems such as Teradata and need to emulate functionality in Synapse.

Azure-Samples/Synapse

The Azure Synapse SQL Extension contain a collection of User Defined Functions (UDFs) and Views that extend the…

github.com

Most often the question on how to size a proper cluster is asked. The general guidelines are:

Number of concurrent Queries — https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/memory-concurrency-limits. Result set caching can reduce the number of concurrent queries.
Size of the Active data set being queried Daily. — 1.5 TB per compute node
Size of the Memory need to perform Calculations in Memory. — 300GB per compute node. This often matches or exceeds existing source systems.
Necessary size of Temp DB the ETL and Queries need (difficult to get from source systems). This is rare, but, I have seen companies need to increase DWU to get additional TempDB space. This typically happens with very large Datasets but have few concurrent queries.

There is also bench marking practices that are published here:

https://techcommunity.microsoft.com/t5/azure-synapse-analytics/performance-benchmark-azure-synapse-analytics-data-warehouse/ba-p/1381302

If you are interested to check the GitHub Azure Samples for Synapse see the link

Azure-Samples/Synapse

The following samples may span across several of the technology specific samples: Shows .NET for Spark and shared meta…

github.com

Attention to Resource Classes will help you !

Resource classes for workload management - Azure Synapse Analytics

Guidance for using resource classes to manage memory and concurrency for Synapse SQL pool queries in Azure Synapse. The…

docs.microsoft.com

Materialized Views

Materialized views in Synapse SQL pool provide a low maintenance method for complex analytical queries to get fast performance without any query change. This article discusses the general guidance on using materialized views.

Performance tune with materialized views — Azure Synapse Analytics | Microsoft Docs

CREATE MATERIALIZED VIEW AS SELECT (Transact-SQL) — SQL Server | Microsoft Docs

DBCC PDW_SHOWMATERIALIZEDVIEWOVERHEAD (Transact-SQL) — SQL Server | Microsoft Docs

What is Azure Synapse ?

Azure-Samples/Synapse

The Azure Synapse SQL Extension contain a collection of User Defined Functions (UDFs) and Views that extend the…

Azure-Samples/Synapse

The following samples may span across several of the technology specific samples: Shows .NET for Spark and shared meta…

Resource classes for workload management - Azure Synapse Analytics

Guidance for using resource classes to manage memory and concurrency for Synapse SQL pool queries in Azure Synapse. The…

Written by Patrick Alexander