Testing SQL Pool Performance in Azure Synapse Analytics

Azure Synapse Analytics (SA) went GA (general availability) last December, introducing a remarkable range of functionality that provides a PaaS database and a complete platform for data engineers, data scientists and business users with SQL or Spark capabilities.

Photo by Tobias Fischer on Unsplash

The feature set of SA is considerably richer than that of a “plain” old Azure SQL Database, but what benefits do we get from the smallest dedicated pool? With this question in mind, I decided to run a test to see if a very moderate-sized pool (DWH100c) can compete against an Azure SQL Database scaled to a similar price point.

Of course, I know that SA is a massively parallel processing (MPP) database, and Azure SQL Database is not, and I could have made the same technical shootout between an SQL database and the former SQL data warehouse, but the new pricing and software bundle have made the comparison even more timely and appropriate. Right now, if you are using an on-demand SQL pool or on-demand Spark cluster, Synapse’s pricing can seem very competitive. According to my experience, in an enterprise environment, 400–800DTU-sized databases are usually the minimum, but that’s precisely why I’m interested to see if the smallest SQL pool can achieve comparable, or at least decent performance.

So I spun up the databases. For Azure SQL Database, I used a standard 800DTU, which is approximately the same price as the Synapse DW100c instance noted above. I also tried a Premium P2 250 tier that confirmed my assumption — but more on that later. For testing purposes, this gives us an idea of how well — or poorly — the small Synapse dedicated SQL pool performs.

I used Microsoft’s New York TAXICab dataset for the test.

The original code was tailor-made for Synapse, so I just ran the scripts and eventually ingested some of the tables (Trip, Date and Time) into the SQL database with a pipeline from the Synapse workspace. Sidenote: the data ingestion took around eight minutes in Synapse and 91 minutes in the 800DTU SQL database — yes, the Trip data was massive and ultimately consumed 32GiB in the SQL Database as a non-compressed, row-storage table.

Initially, I did not use any indexes to see the sheer performance difference. It was immediately obvious that there is no real competition between the two (three, with Premium tier included) databases at this data amount, as you will see. I executed each SQL statement multiple times — here are the average times:

The first one is a record count:

Synapse gave the 100M record answer in around three seconds.
The Azure SQL database @ DTU800 responded in 200+ seconds — the deviation was huge, and sometimes it took over 10 minutes to finish… However, the Premium P2 250 came in at around 77 seconds.

Synapse finished in 10 seconds, Azure SQL @ DTU800 in 62 seconds, Premium P2 250 in 51 seconds.

At this point, I felt that the comparison was just not fair. As we concluded earlier, standard DTUs are very I/O-limited. Sometimes, when a query executes with decent performance, we still struggle to ingest data into standard DTU databases. Premium is a bit better but at a significantly higher cost. In addition to this, let’s introduce the great columnar storage weapon to the standard DTU and see that result as well:

Synapse finished in seven seconds, Azure SQL @ DTU800 in 103 seconds (and in <1 second with columnar storage), while Premium P2 250 finished in 54 seconds.

Synapse finished in 44 seconds, Azure SQL @ DTU800 in 266 seconds (and in around three seconds with columnar storage). Premium P2 250 finished in 60 seconds.

Synapse finished in seven seconds, Azure SQL @ DTU800 in 60 seconds (and four seconds with columnar storage) and Premium P2 250 in 59 seconds.

So, what does this tell us? First, the Synapse DW100c offers very good value: it has decent performance but, being an MPP database, it also has its share of disadvantages — for example, don’t use it while joining huge tables with different distributions, as it would also lead to performance degradation.

The Azure SQL Databases are quite I/O-limited, even at 800DTU. Still, as I showed above, columnar storage mode does the trick here as well — albeit while compromising ETL performance, since daily updates and deletes are executed with higher overhead. At a higher price point, Premium tier provides additional services (IOPS, sizes) that can make a positive difference in performance, but it’s the richer feature set that really makes it an appealing option.

Andras is a Solution Architect at Starschema, specializing in BI, SQL and DataLakes … and finding solutions.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store