What is Azure Synapse ?

Patrick Alexander
Microsoft Azure
Published in
4 min readAug 20, 2020

Azure Synapse is a limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either server-less or provisioned resources at scale. Azure Synapse brings these two worlds together with a unified experience to ingest, prepare, manage, and serve data for immediate BI and machine learning needs.

Architecture and Design of a data warehouse depends on many factors. A dynamic and well performing DW should guaranty the data validity, concurrency, low latency and capability to integrate with other systems.

Azure Synapse provides you the platform to build and mange a modern DW with limitless analytics service that brings together enterprise data warehousing and Big Data analytics. It gives you the freedom to query data on your terms, using either server less on-demand or provisioned resources at scale.

Recently I had to work on a Synapse project and I collected the following references and now I’m sharing it with you.

Cheatsheet

Best Practices

Best practice and guide on Loading Data

Designing Tables

Indexing Tables

Heaps

Clustered Columnstore Index

Designing Distributed Tables — Round Robin vs Hash vs Replicated

Hash-Distribution — Use this distribution methodology for large fact tables with clustered columnstore index.

Replicated — Use this distribution strategy for smaller tables (<2GB), which can be replicated to each compute node, typically used for Dimension Tables.

Round-Robin — Use this strategy for staging tables or for loading data or when there is no clear choice for distribution.

Load data from external tables using polybase

Azure Synapse SQL Extension

The Azure Synapse SQL Extension contain a collection of User Defined Functions (UDFs) and Views that extend the capabilities of Azure Synapse SQL. This is especially useful when you’re migrating from legacy on premise systems such as Teradata and need to emulate functionality in Synapse.

Most often the question on how to size a proper cluster is asked. The general guidelines are:

  1. Number of concurrent Queries — https://docs.microsoft.com/en-us/azure/synapse-analytics/sql-data-warehouse/memory-concurrency-limits. Result set caching can reduce the number of concurrent queries.
  2. Size of the Active data set being queried Daily. — 1.5 TB per compute node
  3. Size of the Memory need to perform Calculations in Memory. — 300GB per compute node. This often matches or exceeds existing source systems.
  4. Necessary size of Temp DB the ETL and Queries need (difficult to get from source systems). This is rare, but, I have seen companies need to increase DWU to get additional TempDB space. This typically happens with very large Datasets but have few concurrent queries.

There is also bench marking practices that are published here:

https://techcommunity.microsoft.com/t5/azure-synapse-analytics/performance-benchmark-azure-synapse-analytics-data-warehouse/ba-p/1381302

If you are interested to check the GitHub Azure Samples for Synapse see the link

Attention to Resource Classes will help you !

Materialized Views

Materialized views in Synapse SQL pool provide a low maintenance method for complex analytical queries to get fast performance without any query change. This article discusses the general guidance on using materialized views.

Performance tune with materialized views — Azure Synapse Analytics | Microsoft Docs

CREATE MATERIALIZED VIEW AS SELECT (Transact-SQL) — SQL Server | Microsoft Docs

DBCC PDW_SHOWMATERIALIZEDVIEWOVERHEAD (Transact-SQL) — SQL Server | Microsoft Docs

--

--