SQL Server Partitioning (Cold/Warm/Hot)

Lorenzo Uriel
8 min readOct 24, 2023

--

Source: Author

The solution is responsible for managing the partitioning of your database daily, monthly and annually.

The rule is: Daily Filegroups are hot data / Monthly Filegroups are warm data / Yearly Filegroups are cold data.

The nomenclature when creating Filegroups is as follows:

  • The years that passed (FGY0000, FGY2020, FGY2021, FGY2022 …)
  • Total months (FGM01, FGM02, FGM03, … FGM10, FGM11, FGM12)
  • Total days (FGD01, FGD02, FGD03, … FGD29, FGD30, FGD31)

Cold Data (Annual):

  • Cold data will be stored on an HDD, data that will be consulted infrequently or infrequently;
  • It will have all the years that have passed and the current year;
  • In the current year there will be data that has recently cooled;
  • Cold Filegroups compression is PAGE (DATA_COMPRESSION=PAGE).

Warm Data (Monthly):

  • The warm data will be stored on another disk, it can be an HDD or an SSD, data that is consulted with low or medium frequency;
  • There will only be two full months and the current month;
  • When the current month closes, a new one will begin and the last month will become cold data;
  • In the current month, there will be data that has become warm recently;
  • There will be no manual compression on warm filegroups.

Hot Data (Daily):

  • Hot data will be stored on an SSD, aiming for query performance, data that is queried with high or medium frequency;
  • Hot data is just 7 days, one week of data for performant queries; When a new day arrives, the last day will be warm;
  • There will be no manual compression on hot filegroups.

Before We Start

We need to create Files and filegroups on our disks. Create all 31 days (FGD00), create all 12 months (FGM00), as for years, you will only create the current one and those that exist in your database (FGY0000).

Example:

Source: Author

Link to create Files and Filegroups: https://bityli.com/p0BZa

Link to understand file extensions: https://bityli.com/1lLK5

After creating the filegroups, we will create the Partition Function and the Partition Scheme. Every year, three months and days.

NOTE: You can already create the 07 days or create them at zero. Creation can start with the Job.

NOTE: FGY0000 serves as a final FG for the years, instead of having several FGs for all the years that have passed, in this database I had data from 2010 onwards, but I am only interested in those from 2020 onwards.

Example:

Source: Author

Solution Environment in SQL Server

1 — We partition the original table at creation or by index.

(Note that the index was ordered, I did this for performance, most of the queries will be on new data.)

Source: Author

2 — We created a staging identical to our original table, also partitioned.

(Nothing can be different between the two, any difference can lead to errors in partitioning.)

Source: Author
  • We can check our tables and filegroups with the query below:
Source: Author
  • We will use this query to monitor and improve the understanding of sanitation as the explanation progress.

Before we follow the next steps, let’s load some data into the table?

(If you are testing with your table, skip this step.)

Source: Author

The FGs were loaded like this:

Source: Author

The logic of the daily solution

The logic of the daily solution is made up of 03 steps, which are:

1 — Prepare the variables that will be used:

  • FG_MERGE (The Filegroup in which a Merge will occur, in the daily solution it will be Day-7 and in the monthly solution it will be month-3)
  • RANGE_MERGE (The date of FG_MERGE. Ex: 2023–03–24 00:00:00.00000)
  • FG_SPLIT (The Filegroup in which the split will occur, that is, in the daily solution it will be the new current day and in the monthly solution the new current month)
  • RANGE_SPLIT (The date of the FG_SPLIT. Ex: 2023–03–31 00:00:00.000000)
  • FG_MONTH (The FG of the month in which the MERGE is occurring. EX: FGM03)
Source: Author

2 — Identify whether we need to create the new filegroup and its creation

  • We identify whether we need to create it or whether it already exists. If it already exists, the query will return the partition number, if it does not exist, it will return 0.

(All queries shown are used in a format that Visual Studio’s Execute SQL Task accepts, so some will be more complicated, or even have logic, like the one shown below.)

Source: Author

(The step below will only be executed if the query above returned 0.)

  • We added a new FG to the Partition Scheme and a new Range to the Partition Function
Source: Author

Shall we check what happened? ; |

Source: Author

(The new FG will now be ready to be used for new data, this second step will be carried out every day for the Daily Solution, until the first FG created becomes the famous Day-7.)

3 — Performs the treatment of partitions and MERGE of FG Day-7

Let’s imagine that 07 days have passed and my FGD31 is now Day-7. As in the examples below:

Source: Author
Source: Author
  • We start with the same query as in step 2, but now it has different logic. We want to identify the partition number where the FG that will be merged is, if it returns 0, we stop here.
Source: Author
  • If it exists, we perform a Switch to Staging.
Source: Author

We can identify that the FGD31 of [part_data] was empty.

Source: Author

If we check [staging_part_data], we will find the files.

Source: Author

Simple, right? :)

Now we will get into the “complicated” and smart part of this solution.

  • Realizamos o MERGE do Range que queremos eliminar
Source: Author

Note that FGD31 no longer exists in [part_data]

Source: Author

And in [staging_part_data] the data was moved to the FG above, FGM03.

Source: Author

Now, let’s divide the data that our monthly Filegroup (FGM03) just received, we will divide it with the Range of the dates it received, the difference is that this Range will be in the same Filegroup.

Source: Author

Let’s check [part_data]

Source: Author

Let’s check [staging_part_data]

Source: Author
  • We perform a SWITCH to return the data to our main table, the partition number will remain the same as the first query
Source: Author

Let’s check [part_data]

Source: Author

Let’s check [staging_part_data]

Source: Author
  • We finish with a MERGE on the same Range.

(We do not perform this step if the Range date is equal to the 1st of any month, redo the logic with a date of the 1st, you will notice that the new month has already been created. This will save us work for the monthly solution.)

Source: Author

Let’s check [part_data]

Source: Author

Vamos checar a [staging_part_data]

Source: Author

It seems like magic, doesn’t it?

What is the objective of this last stage?

The main objective is to save processing and data integrity, since we cannot merge two Filegroups that contain data. The logic is created precisely to save a MERGE between two Filegroups. That is, we move the data to the same FG and then perform a Merge on identical FGs.

The Logic of the Monthly Solution

The logic of the monthly solution is the same as the daily solution, the main changes and differences are:

  • We only need to run steps 01 and 03, that is, the step that takes care of the variables (step 01) and the step that handles the partitions and the MERGE of month-3 (step 03). The creation of the new monthly Filegroup (step 02) is already carried out at the end of step 03 of the daily solution.
  • The variables will now receive different values, month will become year and day will become month.
Source: Author

When to run? It can run daily or every first day of the month. Queries that check the partition will do the job. They will define whether we will run the solution or stop the Job. Example below:

Source: Author

What you can do now?

Now you can automate this flow by adding this same sequence into a solution. After that, you can schedule the solutions to run daily.

Example Visual Studio — SSIS solutions:

  • Daily Sanitization — Creates the current day and merges Day-7 with the month
  • Monthly Sanitization — Creates the current month and merges month-3 with the year

--

--