Table Partitioning in SQL Server: an Overview

Published in

Tech Blog

7 min readJan 8, 2023

This article explains the concept of table partitioning in SQL Server and why it is essential. Then, it illustrates the different steps needed to create a partitioned table.

Database file types

Before we begin, it is worth remembering the different database files in SQL Server.

MDF, NDF, and LDF are database files used by Microsoft SQL Server, a relational database management system.

MDF stands for “master database file.” It is the primary file for a database and contains the data and objects such as tables, indexes, and stored procedures.

NDF stands for “secondary database file.” It is used to store additional data for a database and is usually used to split large tables or store data accessed less frequently.

LDF stands for “log database file.” It stores the transaction log for a database, which records all changes made to the data. The transaction log is used to recover the database after a failure or to revert changes made to the database.

In summary, MDF files contain the primary data and objects for a database, NDF files contain additional data for a database, and LDF files contain the transaction log for a database.

What is table partitioning?

In Microsoft SQL Server, table partitioning is a way to divide a large table into smaller, more manageable pieces called partitions. This can improve query performance because the database engine can quickly and efficiently access and manipulate the data in each partition. There are two types of table partitioning in SQL Server: horizontal partitioning and vertical partitioning.

Horizontal partitioning involves dividing a table into multiple tables with the same structure; each called a partition. For example, you might have a large table that stores customer orders, and you could divide it into separate partitions by year. This would allow you to easily query and manage the data for a specific year without searching through the entire table.

Vertical partitioning involves dividing a table into smaller tables with a subset of the original table’s columns. This can be useful if you have a table with many columns, some of which are rarely used or are not needed for specific queries. By separating these columns into a separate table, you can improve the performance of queries that only need to access a subset of the columns.

To implement table partitioning in SQL Server, you will need to use the CREATE PARTITION FUNCTION and CREATE PARTITION SCHEME statements. You will then use the ON clause of the CREATE TABLE statement to specify the partition scheme for the table.

For example, the following code creates a partition function and scheme for a table that is horizontally partitioned by year:

CREATE PARTITION FUNCTION pf_Orders (int)
AS RANGE LEFT FOR VALUES (2018, 2019, 2020, 2021);

CREATE PARTITION SCHEME ps_Orders
AS PARTITION pf_Orders
ALL TO ([PRIMARY]);

CREATE TABLE Orders (
    OrderId int NOT NULL,
    CustomerId int NOT NULL,
    OrderDate datetime NOT NULL,
    TotalAmount money NOT NULL,
    PRIMARY KEY (OrderId)
) ON ps_Orders (OrderDate);

This creates a partition function pf_Orders that defines four partitions for the table based on the year of the OrderDate column. It then creates a partition scheme ps_Orders that maps the partition function to the PRIMARY filegroup, which is the default location for storing table data. Finally, it creates the Orders table and specifies the ps_Orders partition scheme using the ON clause.

Why is it important?

There are several benefits to using table partitioning in SQL Server:

Improved Query Performance: Queries against a partitioned table can be faster because they only need to access a subset of the data rather than the entire table.
Improved Data Loading Performance: You can load data into a partitioned table more quickly because you can simultaneously load data into multiple partitions.
Easier Data Management: You can easily manage large tables by partitioning them. For example, you can drop or truncate a partition instead of the entire table, which can be faster and more efficient.
Better Data Archiving: You can use partitioning to archive old or rarely accessed data by moving it to a separate partition or filegroup. This can help improve the performance of queries against the active data.

Overall, table partitioning can be an essential tool for improving the performance and manageability of large tables in SQL Servers. It can help you more quickly and efficiently manage and query large amounts of data, which can be especially useful in data warehousing and other scenarios where you need to manage large amounts of data.

Creating a partition function

In SQL Server, you can use the PARTITION BY clause in a SELECT statement to divide the result set into partitions to which an aggregate function is applied. The PARTITION BY clause is often used with the ROW_NUMBER() function to assign a unique number to each row within a partition.

Here is an example of how you can use the PARTITION BY clause in a SELECT statement in SQL Server:

SELECT
  column1,
  column2,
  ROW_NUMBER() OVER (PARTITION BY column1 ORDER BY column2) AS row_number
FROM
  table_name

This SELECT statement will retrieve the values in column1 and column2 from table_name, and use the PARTITION BY clause to divide the result set into partitions based on the values in column1. Within each partition, the ROW_NUMBER() function will assign a unique number to each row based on the values in column2. The ORDER BY clause is used to specify the order in which the rows are numbered within each partition.

You can also use the PARTITION BY clause with other aggregate functions, such as SUM(), AVG(), and MAX(). For example:

SELECT
  column1,
  SUM(column2) OVER (PARTITION BY column1) AS total
FROM
  table_name

This SELECT statement will retrieve the values in column1 from table_name, and use the PARTITION BY clause to divide the result set into partitions based on the values in column1. The SUM() function will then be applied to each partition to calculate the total of column2 for each value in column1.

Creating a partition schema

In SQL Server, a partition schema is a collection of partition functions and the filegroups associated with each partition. A partition function is a mathematical function that divides the rows of a table or index into a specific number of partitions based on the values of a specified column. A filegroup is a logical container for data files in a database.

To create a partition schema in SQL Server, you must create a partition function and map the partitions to filegroups. Here is an example of the basic syntax for creating a partition function and a partition schema in SQL Server:

CREATE PARTITION FUNCTION partition_function_name (input_parameter_type)
AS RANGE FOR VALUES (partition_boundary_value1, partition_boundary_value2, ...)

CREATE PARTITION SCHEMA partition_schema_name
AS PARTITION partition_function_name
TO (filegroup1, filegroup2, ...)

In this example, partition_function_name is the name of the partition function, input_parameter_type is the data type of the input parameter for the partition function, partition_boundary_value1, partition_boundary_value2, etc. are the values that define the boundaries between the partitions, and filegroup1, filegroup2, etc. are the names of the filegroups that the partitions will be mapped to.

Apply partition schema to a table

To apply a partition schema to a table in SQL Server, you will need to use the ON clause of the CREATE TABLE statement and specify the partition schema and the column on which the table will be partitioned.

Here is an example of the basic syntax for creating a table and applying a partition schema in SQL Server:

CREATE TABLE table_name
(
  column1 data_type,
  column2 data_type,
  ...
)
ON partition_schema_name (column1)

In this example, table_name is the name of the table, column1 and column2 are the names of the columns in the table, and partition_schema_name is the name of the partition schema that you want to use to partition the table. The table will be partitioned based on the values in column1.

It is important to note that once a partition schema has been created and a table has been partitioned using the schema, you cannot drop the schema or remove the partitioning from the table without first dropping the table.

You can also use the ALTER TABLE statement to add partitioning to an existing table. Here is an example of the basic syntax for adding partitioning to a table in SQL Server using the ALTER TABLE statement:

ALTER TABLE table_name
ADD PARTITION scheme_name (column1)

In this example, table_name is the name of the table that you want to partition, scheme_name is the name of the partition schema that you want to use, and column1 is the name of the column on which the table will be partitioned.

It is important to note that you cannot use the ALTER TABLE statement to remove partitioning from a table. To remove partitioning from a table, you will need to drop the table and re-create it without partitioning.

Summary

Table partitioning is crucial to storing massive data in SQL Server. It has several benefits regarding data storage, data ingestion, and query performance. This article explained the concept of table partitioning and are steps required to apply table partitioning in SQL Server.