Table Geometries of Azure Synapse Analytics SQL Pools

chamathka maddugoda
Quick Insight
Published in
2 min readNov 24, 2023

Azure synapse analytics uses massively parallel processing architecture within a SQL pool. This requires us to understand table geometries.

Table Geometry defines how the data is sharded into distributions on the available compute nodes. The purpose here is to optimize the performance.

Table Geometry is designed at the table creation. And there are 3 types of table geometries.

  1. Round-Robin
  2. Hash
  3. Replicated

Round Robin

Round Robin distribution table distributes data evenly across all the compute nodes. This is the default distribution for Azure Synapse Analytics and most straight forward table to create. Choice of the distribution is performed randomly and buffers of rows are assigned to distribution sequentially.

Advantages:

  • Delivers fast performance when used as a data loading staging table.
  • Quick to load data

Disadvantages:

  • Joins in round robin may negatively affect query workloads as data gathered for processing has to be reshuffled to other compute nodes with additional time and processing.

Hash

With hash distributed table, Azure synapse Analytics uses hash function to assign each row to one distribution. In this table distribution, one of the columns in the table is used as the distribution column. Hash function uses the values in this distribution column to assign each row to a distribution.

Advantages :

  • Provides highest query performance for joins and aggregations in large tables.

Replicated

Replicated table caches full copy on each compute node.

Advantages:

  • provides the fastest query performance for small tables.

Understanding table geometry is essentially important in understanding Azure synapse analytics in order to maximize the performance of the database.

Wrap up Quiz

  1. What is the default distribution type in Azure Synapse Analytics SQL pool?
  2. Which of the distribution type caches full copy on each node?
  3. True or False?
  • Query performance is better for larger data sets with round robin

Answers:

  1. Round Robin (Replicated is the default for parallel data warehouse)
  2. Replicated
  3. False: Query performance is better for larger data sets with Hash distribution tables.

Cheers! That’s all on this brief blog on table geometries. Hope you got all the answers correct at wrap up quiz.

The purpose of the quick insight series is to provide quick insights in to topics in data engineering and data science with shortest possible blogs.

You can find how above concepts are checked in Azure Data Engineer Associate Certification Exam DP 203 at https://www.examtopics.com/exams/microsoft/dp-203/view/

Hope you learnt something with this quick reading. If you found the above blog helpful leave a response and stay connected 😊.

Thank you !

--

--