Choosing OLAP Storage: Microsoft SQL Server With Columnstore Indexes

Aleh Belausau
Towards Data Engineering
16 min readMay 18, 2024

Previously, I described key metrics that should be considered when making the critical choice of OLAP storage for your needs in this article — How to Choose the Right OLAP Storage. Now, I have decided to apply this approach in practice and examine the most popular, as well as some of the not-so-popular, OLAP storages. The main goal is to discern the strengths and weaknesses of each OLAP storage solution and determine the most fitting use case for each.

What is OLAP Storage?

What is online analytical processing? Online analytical processing (OLAP) is software technology you can use to analyze business data from different points of view. In OLAP scenarios, datasets can be massive — billions or trillions of rows. Data is organized in tables that contain many columns, and only a few columns are selected to answer any particular query. Results must be returned in milliseconds or seconds. Basically, OLAP storage refers to storage optimized for analytical workloads.

Here is the list of OLAP storages in this research:

SQL Server logo

Overview

  • Developers — Microsoft
  • Written in — C, C++
  • Type — Relational database management system

Famed for its excellence in handling transactional operations, Microsoft SQL Server has firmly positioned itself as a premier Relational Database Management System. Yet, its capabilities go beyond Online Transaction Processing, as it demonstrates exceptional performance in Online Analytical Processing storage, contributing significantly to data analysis and decision-making processes.

What drives SQL Server, known primarily for its strength in transactional tasks, to excel in the realm of OLAP storage? The answer lies in the transformative influence of Columnstore Indexes.

Marking a significant milestone in the evolution of SQL Server, Columnstore indexes revolutionize its support for OLAP storage. Departing from traditional row-wise storage methods, these indexes arrange data in a columnar format, fundamentally reshaping query performance for analytical workloads.

Storage Architecture & Semi-Structured Data Support

1. Storage Format:

Microsoft SQL Server supports two primary types of data storage formats: Rowstore and Columnstore:

  • Columnstore: A columnstore is data that’s logically organized as a table with rows and columns, and physically stored in a column-wise data format. A rowgroup is a group of rows that are compressed into columnstore format at the same time. A rowgroup usually contains the maximum number of rows per rowgroup, which is 1,048,576 rows. For high performance and high compression rates, the columnstore index slices the table into rowgroups, and then compresses each rowgroup in a column-wise manner.
  • Rowstore: A rowstore is data that’s logically organized as a table with rows and columns, and physically stored in a row-wise data format. This format is the traditional way to store relational table data.

You can transform your row-store table into the columnstore format using the CREATE COLUMNSTORE INDEX T-SQL syntax. However, it’s important to note that columnstore indexes work best when the data is stable, and queries should be updating and deleting less than 10% of the rows.

The choice between rowstore and columnstore depends on the specific use case and requirements of your data storage. For example, rowstore might be more suitable for OLTP workloads with lots of small transactions, while columnstore might be more suitable for OLAP workloads that involve complex queries over large amounts of data.

2. Separation of Compute and Storage:

Microsoft SQL Server, particularly in its Azure SQL Managed Instance offering, implements the concept of separation of compute and storage. This architecture consists of 2 layers:

  • Compute Layer: This is a stateless layer that runs the sqlservr.exe process and contains only transient and cached data. This includes the tempdb database that resides on the attached SSD Disk, and memory structures such as the plan cache, the buffer pool, and columnstore pool that reside in memory.
  • Storage Layer: This is a stateful data layer where the user database data & log files reside in Azure Blob storage. This type of repository has built-in data availability and redundancy features. It guarantees that every record in the log file or page in the data file will be preserved even if the sqlservr.exe process crashes.

The separation of compute and storage allows for independent scaling of these two components. This means you can scale your compute resources (CPU, memory) separately from your storage resources, which can lead to significant cost savings. You can adjust compute resources based on your workload’s demands without having to add unnecessary storage, and vice versa.

3. Semi-Structured Data Support:

Microsoft SQL Server provides robust support for semi-structured data, which is a type of data that does not conform to the formal structure of data models but contains tags or other markers to separate semantic elements and enforce hierarchies of records and fields.

Here are the most popular ways SQL Server supports semi-structured data:

  • XML Data Type: SQL Server has an xml data type that allows you to store XML documents and fragments in a SQL Server database. XML is a form of semi-structured data that can represent complex data relationships in a hierarchical manner. You can also perform XQuery queries against XML data stored in columns and variables of the xml type.
  • JSON Support: Starting from SQL Server 2016, JSON is supported in SQL Server, where JSON data can be queried and stored. JSON is another popular format for semi-structured data, allowing complex and varied data structures to be represented.
  • Large UDTs: SQL Server 2008 introduced support for large value User-Defined Types (UDTs), which can be used to store and manage semi-structured and unstructured data.
  • New Data Types: SQL Server 2008 and later versions introduced new data types designed to work with structured, semi-structured, and unstructured data. These data types allow for more flexibility in handling semi-structured data.
  • FOR XML and OPENXML: SQL Server provides the FOR XML clause to retrieve relational data in XML format, and the OPENXML function to retrieve XML data in relational format. These features provide flexibility in working with semi-structured data in XML format.

Deployment & Pricing

1. Deployment Model:

SQL Server can be deployed through different models:

  • On-Premises: Managed within the organization’s infrastructure, providing full control over hardware and software.
  • Cloud: Offers IaaS, PaaS, and SaaS options, where the infrastructure and management responsibilities vary.
  • Containerized: SQL Server instances run within containers, offering flexibility and consistency across environments.

2. Fully Managed Service Options:

SQL Server can be used as a fully managed service through various providers. Here are a few options:

  • Azure SQL Database: Azure SQL Database is a fully managed platform as a service (PaaS) Database Engine that handles most of the database management functions such as upgrading, patching, backups, and monitoring without user involvement.
  • Azure SQL Managed Instance: Azure SQL Managed Instance is a fully managed SQL Server Database Engine hosted in Azure cloud which provides near 100% compatibility with on-premises SQL Server database instances. It’s a best migration path to cloud for SQL Server enterprise customers.
  • Amazon RDS for SQL Server: Amazon Relational Database Service offers managed SQL Server databases on Amazon Web Services. It automates common administrative tasks such as patching, backups, and scaling.
  • Google Cloud SQL for SQL Server: Google Cloud SQL is a fully managed relational database service on Google Cloud Platform. It offers automated backups, failover, and scaling capabilities.

3. Scalability:

SQL Server’s scalability largely depends on the edition and configuration. SQL Server Enterprise Edition supports scaling out with features like Always On Availability Groups, but it may not scale as seamlessly or cost-effectively as some purpose-built OLAP solutions.

4. Pricing Model:

Microsoft SQL Server offers a variety of pricing models to meet different business needs:

  • Enterprise Edition: This edition is designed for large-scale and high-traffic databases.
  • Standard Edition (Per Core):This edition is suitable for medium-scale applications and data marts.
  • Standard Edition (Server): This edition is designed for small-scale applications and provides basic database, reporting, and analytics capabilities.
  • Standard Edition (CAL): This model requires a CAL for every user or device accessing a server.
  • Developer Edition: This edition is available for free and is intended for development and testing.
  • Express Edition: This edition is also available for free and is suitable for lightweight databases and learning.

Azure SQL is a fully managed relational database service provided by Microsoft on the Azure cloud platform. Here’s some information about its pricing options:

  • Azure SQL Database Single Database: This service is part of the Azure SQL family of SQL database services. The pricing for Azure SQL Database Single Database is based on a vCore-based purchase model. This model allows you to scale compute, memory, and storage based upon your workload needs.
  • SQL Database serverless compute tier: This tier optimizes price-performance and simplifies performance management for single databases with intermittent, unpredictable usage by auto-scaling compute and billing for compute used per second.
  • General purpose: Ideal for most business workloads, offering balanced and scalable compute and storage options.
  • Elastic pools: For environments with multiple databases that have varying and unpredictable usage demands, elastic pools can provide cost savings compared to provisioning the same number of single databases.

Please note that Columnstore indexes are supported in Azure SQL Database with tier S3 and higher. This includes both General Purpose and Business Critical vCore based databases

Managemant

1. Community/Support:

Microsoft SQL Server has a robust community and support systems:

  • Tech Community Forums: Microsoft has Tech Community forums for SQL Server and Azure databases. These forums allow you to connect with Microsoft Engineering experts and other members of the community in message boards, and access additional resources.
  • SQL Server Support Blog: The SQL Server Support Blog discusses all areas of SQL Server: Database Engine, Management Tools, Analysis Services, Reporting Services, Integration Services, and Data Access. It also includes popular troubleshooting content.
  • Professional Association for SQL Server (PASS): You can join the PASS, a global community for data professionals who use the Microsoft data platform.
  • Contact Microsoft Support: If you need further assistance, you can contact Microsoft Support.

2. Documentation:

Microsoft provides a comprehensive SQL Server documentation library that includes tutorials, how-to guides, and resources to help you get started with SQL Server.

3. Ease of Management:

Microsoft SQL Server simplifies database administration with tools like SQL Server Management Studio for graphical management, centralized management for multiple instances, policy-based management for enforcing standards, automated maintenance plans, and features like DMVs and Extended Events for monitoring. It also offers resource allocation with the Resource Governor, high availability with features like AlwaysOn, and integrates seamlessly with Azure for cloud scalability and management. Overall, it’s designed to make database management easier and more efficient.

4. Learning curve:

Microsoft SQL Server is a comprehensive database system with a wide range of features, which means the learning curve can vary depending on your prior experience and the specific features you need to use. For those with some experience in databases, you might find SQL Server easier to learn. The SQL language used in SQL Server is based on standard SQL, so if you’re familiar with SQL, you’ll be able to get up and running quickly. For advanced users, SQL Server offers a host of advanced features like Analysis Services, Reporting Services, and Integration Services. Learning these features might require more time and practice.

It’s also important to remember that you don’t need to learn everything at once. You can start with the basics and gradually explore more advanced features as needed.

5. SQL Support:

Microsoft SQL Server supports Transact-SQL.

Integration

1. Supported Data Sources:

Microsoft SQL Server supports a wide range of data sources, providing flexibility and interoperability with various systems:

  • Microsoft SQL Server: SQL Server can connect to other SQL Server databases, including Azure SQL Database and Azure Synapse Analytics.
  • Microsoft Access: You can connect to Microsoft Access databases.
  • Microsoft Excel: SQL Server can import data from Excel files.
  • Text Files: SQL Server supports importing data from text files, such as .txt, .tab, .csv.
  • ODBC and OLE DB: SQL Server supports a variety of ODBC data sources and OLE DB providers.
  • Other Relational Databases: SQL Server can connect to other relational databases such as PostgreSQL, Oracle, IBM DB2, Sybase, Informix, and Teradata using respective OLE DB providers or .NET Data Providers.
  • Analysis Services: SQL Server can connect to Analysis Services cubes.
  • Azure Blob Storage: SQL Server can connect to Azure Blob Storage.

In addition to these, SQL Server also supports creating external data sources for querying using SQL Server, Azure SQL Database, Azure SQL Managed Instance, Azure Synapse Analytics, Analytics Platform System, or Azure SQL Edge.

2. Cloud Services Integration:

While Microsoft SQL Server has some capability to interact with AWS and GCP services, its most comprehensive and seamless integrations are with Azure’s own suite of services.

Microsoft Azure:

  • SQL Server Integration Services (SSIS): You can move your SQL Server Integration Services (SSIS) projects, packages, and workloads to the Azure cloud. SSIS works with any managed SQL Server instance.
  • Azure Synapse Analytics: SQL Server Integration Services (SSIS) provides a variety of options for connecting to, and loading data into, Azure Synapse Analytics.
  • Azure Data Studio: For customers needing a cross-platform companion to SSMS for managing SQL and other Azure databases, Microsoft recommends using Azure Data Studio.
  • Azure Data Factory: Azure Data Factory seamlessly integrates with SQL SSIS by providing a platform for orchestrating and managing data workflows in the Azure cloud. With Azure Data Factory, you can easily lift and shift your SSIS projects, packages, and workloads to the cloud environment, leveraging its scalable and serverless architecture. Furthermore, Azure Data Factory offers native connectors and activities specifically designed for integrating with SQL Server and other Azure services, including Azure Synapse Analytics.

AWS Cloud:

  • SQL Server on AWS: SQL Server on AWS provides the option to run Microsoft SQL Server in a cloud environment.
  • Amazon EC2: You can run SQL Server on Amazon Elastic Compute Cloud.
  • Amazon RDS: Amazon RDS for Microsoft SQL Server is a managed database service that simplifies the provisioning and management of SQL Server on AWS.
  • Amazon RDS Custom: Amazon RDS Custom for Microsoft SQL Server is another option for running SQL Server on AWS.

GCP Cloud:

  • Cloud SQL: Cloud SQL for SQL Server is a managed database service that helps you set up, maintain, manage, and administer your SQL Server databases on Google Cloud.
  • Cloud Data Fusion: Cloud Data Fusion provides an interface that lets you deploy code-free ETL (extract, transform, load) and ELT (extract, load, transform) data pipelines.

Please note that the availability of these integrations may depend on the version of SQL Server and the specific tools you are using.

3. SDK Support:

Microsoft SQL Server provides a variety of SDKs to help developers build applications that interact with SQL Server:

  • .NET/C#: The Microsoft SqlClient Data Provider for SQL Server is a .NET Framework and .NET Core data provider.
  • Java: Microsoft provides a Java Database Connectivity (JDBC) driver for use with SQL Server, Azure SQL Database, and Azure SQL Managed Instance at no extra charge.
  • Node.js: The tedious module is a JavaScript implementation of the TDS protocol, which is supported by all modern versions of SQL Server.
  • Microsoft Extensibility SDK for C# for SQL Server: This SDK is an interface for the .NET language extension that is used to exchange data with SQL Server and to execute C# code from SQL Server.
  • Microsoft.Build.Sql: This is a package introduced in the SQL Database Projects extension for Azure Data Studio and VS Code.
  • pyodbc: Microsoft recommends using the pyodbc driver to connect to SQL Server from Python. This driver allows Python applications to access SQL Server using the ODBC interface.
  • pymssql: This is another Python SQL driver that you can use to interact with SQL Server.
  • Azure SQL SDK: The Azure SQL SDK for Python allows you to work with data stored in Azure SQL Database using the pyodbc ODBC database driver.

4. Supported Visualization Tools:

Microsoft SQL Server supports a variety of visualization tools that can help you analyze and present your data:

  • Power BI: Power BI is a business analytics tool developed by Microsoft. It provides interactive visualizations with self-service business intelligence capabilities.
  • Looker Studio: Looker Studio is a modern data platform that can connect to SQL Server for creating data models and visualizations.
  • Tableau: Tableau is a powerful data visualization tool that can connect to SQL Server for creating interactive dashboards.

Performance

  1. Insert operations

Columnstore indexes in SQL Server are specifically optimized for insert operations, offering several advantages due to their unique architecture. Bulk insert operations benefit from the columnar storage format, which compresses data and reduces the required I/O, resulting in faster data loading times compared to traditional rowstore indexes. Additionally, bulk inserts into columnstore indexes are optimized for minimal logging. SQL Server efficiently organizes inserted data into segments, with each segment containing data from a specific range of rows. This organization allows for effective segment elimination during subsequent queries, thereby enhancing query performance.

2. Updates operations

While Columnstore indexes are optimized for read-heavy workloads, SQL Server supports efficient update operations through Delta stores, which temporarily store updates before compressing and merging them into the main columnstore.

3. Join operations

SQL Server with Columnstore indexes supports high-performance join operations. The optimized storage format and advanced query processing techniques enhance join performance, particularly for large datasets.

4. Aggregation queries

Aggregation queries benefit significantly from columnstore indexes due to their ability to scan only the relevant columns and employ advanced compression techniques. By reading only the necessary columns, SQL Server reduces the volume of data processed, leading to lower I/O and faster query execution. The use of compression further minimizes data size, optimizing storage and retrieval times. Additionally, batch mode execution for aggregation queries enhances performance by processing multiple rows simultaneously, taking full advantage of modern CPU architectures. This combination of selective column scanning, advanced compression, and efficient execution makes columnstore indexes ideal for data-intensive operations, particularly in data warehousing and analytics scenarios.

2. Materialized View Support:

SQL Server does not support Materialized views.

3. Indexing:

SQL Server provides multiple indexing options, each designed to optimize different types of queries and workloads. These indexing options include:

  • Clustered Indexes: A clustered index determines the physical order of data in a table. There can only be one clustered index per table because the data rows themselves are stored in the order of the clustered index.
  • Non-Clustered Indexes: Description: Non-clustered indexes create a separate structure from the data rows, containing a sorted list of key values and pointers to the data rows. A table can have multiple non-clustered indexes.
  • Unique Indexes: Enforce the uniqueness of the index key values, ensuring that no two rows have the same value in the indexed columns.
  • Full-Text Indexes: Enable efficient searching of text data within a column, supporting complex queries that involve words and phrases.
  • Spatial Indexes: Optimize the storage and querying of spatial data types, such as geometry and geography.
  • Filtered Indexes: Create an index on a subset of rows in a table, defined by a filter predicate.
  • XML Indexes: Enhance the performance of queries on XML data stored in XML columns. There are two types: primary and secondary XML indexes.
  • Columnstore Indexes: Store data in a columnar format rather than row-based, optimized for read-heavy and analytical workloads. There are two types: clustered columnstore indexes and non-clustered columnstore indexes.
  • Memory-Optimized Indexes: Designed for tables stored in memory-optimized filegroups, supporting in-memory OLTP (Online Transaction Processing).
  • Hash Indexes: A type of memory-optimized index that uses a hash function to determine the index’s location. It is particularly efficient for equality searches.

4. Streaming Ingestion:

SQL Server supports streaming ingestion through various tools and technologies within the Microsoft ecosystem, such as SSIS, Azure Stream Analytics, Azure Data Factory, Change Data Capture, Kafka connectors, and Azure Event Hubs. These tools enable efficient real-time data ingestion and processing, making SQL Server a versatile platform for handling streaming data scenarios.

Strengths

  1. Exceptional Performance with Columnstore Indexes: SQL Server excels in OLAP scenarios due to Columnstore indexes, which offer high compression rates, efficient data storage, and faster query performance for large datasets.
  2. Comprehensive Indexing Options: Supports a wide range of indexing types, including clustered, non-clustered, full-text, spatial, and XML indexes, catering to diverse query and data requirements.
  3. Robust Support for Semi-Structured Data: Native support for XML and JSON, as well as large user-defined types, facilitates handling semi-structured data effectively.

Weaknesses

  1. Complexity in Managing Updates: While optimized for read-heavy workloads, managing updates in Columnstore indexes can be complex due to the need for Delta stores.
  2. Higher Costs for Enterprise Features: Some advanced features and high scalability options are only available in the Enterprise Edition, which can be cost-prohibitive for smaller organizations.
  3. Learning Curve: The comprehensive feature set and advanced functionalities can present a steep learning curve for new users or those with limited database experience.

Best use case

SQL Server with Columnstore indexes is ideal for data warehousing and analytics workloads, where fast query performance and efficient storage of large datasets are crucial. It excels in scenarios involving complex queries, aggregations, and data analysis tasks. Furthermore, SQL Server allows the combination of both Columnstore and rowstore indexes within the same database, providing the flexibility to optimize for both analytical and transactional workloads. This hybrid approach supports not only OLAP scenarios but also traditional relational models, making SQL Server suitable for environments that require both high-performance analytics and robust transactional processing. This capability is particularly beneficial for businesses that need to maintain real-time transaction processing while performing in-depth data analysis on the same system, ensuring seamless integration and operational efficiency.

Worst use case

While SQL Server with its combination of Columnstore and rowstore indexes supports both analytical and transactional workloads, it may still face challenges in scenarios that require extremely high-frequency real-time transactional processing with minimal latency. Systems that demand ultra-low latency for millions of transactions per second, such as high-frequency trading platforms or real-time bidding systems, might experience performance overhead due to the complexity of managing updates in Columnstore indexes and the overhead of maintaining hybrid indexing strategies. In these cases, specialized in-memory databases or purpose-built transactional systems might be more suitable.

In summary, Microsoft SQL Server with Columnstore Indexes excels in handling OLAP workloads, offering exceptional query performance, efficient data storage, and high compression rates. Its strengths lie in its diverse indexing options and robust support for semi-structured data. Additionally, SQL Server is capable of supporting both OLTP and OLAP workloads, providing flexibility for various types of database applications. However, it presents challenges in managing updates due to the complexity of Delta stores and may be cost-prohibitive for smaller organizations given the higher costs of its Enterprise Edition features. SQL Server’s comprehensive feature set also entails a steep learning curve.

Always check the official documentation for the latest information on Microsoft SQL Server.

For a more comprehensive understanding of how to assess this information, please refer to the key metrics outlined in the article How to Choose the Right OLAP Storage when making the crucial decision for your OLAP storage needs.

If you are interested in optimizing SQL Server performance through effective locking management, you should check out How to Manage Locking in SQL Server for Optimal Performance.

To enhance your proficiency in data management, explore the Strategic guide on mastering data for software developers.

--

--

Aleh Belausau
Towards Data Engineering

Data and Software Engineer specializing in using cloud technologies for business growth. https://www.linkedin.com/in/aleh-belausau/