How to maintain SQL Server statistics on PolyBase external table

Updating statistics is not supported on SQL Server external tables, try the workaround I found!

Sergio Govoni
CodeX

--

Recently I had the opportunity to configure a maintenance plan for a SQL Server database with external tables and external data source for PolyBase queries. External data sources are used to establish connectivity and support these primary use cases:

  • Data virtualization and data load using PolyBase
  • Bulk load operations using SQL Server or SQL Database using BULK INSERT or OPENROWSET

You can find more information about what SQL Server PolyBase is in this article: Polybase for beginners.

Polybase for beginners
Polybase for beginners

It was not possible to test database maintenance at the same time as the maintenance plan configuration, so I waited for the time window dedicated to these activities.

The maintenance plan (classic configuration) has been completed with an error. The error was in the update statistics task, and it returned the following error message:

UPDATE STATISTICS [dbo].[ExternalTableName] WITH FULLSCAN, COLUMNS

Message 46519, level 16, state 22

The object Update Statistics isn't supported on External Table

The scenario is described in the CREATE STATISTICS documentation page here where this sentence is highlighted: “Updating statistics is not supported on external tables. To update statistics on an external table, drop and re-create the statistics” that tells us explicitly that update statistics is not supported on external tables.

When creating external table statistics, SQL Server imports the external table into a temporary SQL Server table and then creates the statistics. For sample statistics, only the sampled rows are imported. If you have a large external table, it is faster to use the default sampling instead of the full scan option.

In addition, when the external table uses DELIMITEDTEXT, CSV, PARQUET, or DELTA as data types, external tables only support statistics for one column per CREATE STATISTICS command.

The error can be worked around using one of the following options:

  1. Ignore the external table (this option is not available for maintenance plan managed by SQL Server Management Studio maintenance plan; third-party solutions are needed)
  2. Drop and create statistics as a part of the maintenance plan (drop statistics before starting the update statistics task and recreate them after the statistics maintenance)

I decided to use the second option, so I developed the stored procedure sp_drop_create_stats_external_table that can generate all the T-SQL statements for drop and create statistics defined on external tables; it supports statistics on multiple columns!

The sp_drop_create_stats_external_table stored procedure is available for download from GitHub:

From the maintenance plan prospective, CREATE and DROP STATISTICS statements can be stored on a temporary table or working table and executed separately. DROP STATISTICS statements can be executed before the maintenance statistics task and afterward the CREATE STATISTICS statements. Pay attention to the error handling during the maintenance plan because, at the end of the maintenance, statistics have to be in place on the external table. The output of the stored procedure cannot be missed.

Enjoy the sp_drop_create_stats_external_table and… anyone who wants to contribute is welcome!

--

--

Sergio Govoni
CodeX
Writer for

CTO at Centro Software, Microsoft Data Platform MVP