Last-Page Insert Contention in SQL Server

Alpay Kurbaloğlu
Trendyol Tech
Published in
9 min readDec 3, 2021
Photo by Durmuş Kavcıoğlu on Unsplash

Hello,

Today, we will talk about one of the common problems you can encounter in today’s database world. Last page contention aka Convoy waiting. We will check how concurrent inserts in applications affect SQL Server performance, what Microsoft brings us as a solution, how effective that solution is, and what else we can do about this problem.

Concurrent programming is not a new thing. I’m saying “today’s database world” because, in peak traffic time, you may find yourself in a situation where the application scales itself easily to 10–20 pods, and these 10–20 or more pods try inserting to a table in a primary database instance.

There is a table with an Id column, and this Id column is PK. This column also has clustered index on it. Sounds familiar? This is an example of the most common table design. With this design, if you find yourself in a situation like above, you will see waitings like PAGELATCH_EX, PAGELATCH_SH when an application tries concurrent inserts. (Note that PAGELATCH differs from a PAGEIOLATCH. The latter is used to synchronize physical access to pages when they are read from or written to disk.) First, let’s discuss what PAGELATCH is and why we see these waitings.

Latches are lightweight synchronization primitives that are used by the SQL Server engine to guarantee consistency of in-memory structures including; index, data pages, and internal structures, such as non-leaf pages in a B-Tree.

SQL Server uses buffer latches to protect pages in the buffer pool and I/O latches to protect pages not yet loaded into the buffer pool. Whenever data is written to or read from a page in the SQL Server buffer pool a worker thread must first acquire a buffer latch for the page.

PAGELATCH (latch on a data or index page) is a thread-synchronization mechanism. It is used to synchronize short-term physical access to database pages that are located in the Buffer cache.

So SQL Server uses page latches for physical page protection in the Buffer cache. If we have clustered index, data will be ordered by the leading key column of the clustered index. If this column is sequential, new inserts will be added on the same page’s end until that page is filled up. High-density concurrent inserts will try to use the same page and that’s where you see waits. By the way, this situation can happen on non-clustered indexes also because leaf-level pages will be ordered by the leading column of the index.

First, we will create a standard table consisting of 3 columns; Id is PK and identity column has a clustered index, CreateDate has default sysdatetime() on it, and a Status column. Create script will be;

CREATE TABLE [dbo].[TestTable_V1]([Id] [bigint] IDENTITY(1,1) NOT NULL,[Status] [nvarchar](max) NOT NULL,[CreateDate] [datetime] NOT NULLCONSTRAINT [PK_TestTable_V1] PRIMARY KEY CLUSTERED([Id] ASC) ) GOALTER TABLE [dbo].[TestTable_V1] ADD CONSTRAINT [DF_TestTable_V1_CreateDate] DEFAULT (sysdatetime()) FOR [CreateDate]GO

Secondly, we will prepare an insert statement with a dummy Status value. Id and CreateDate will be auto-generated.

INSERT INTO dbo.[TestTable_V1] ([Status]) VALUES (‘CREATED’)

We’re gonna create a load with OStress Tool. You can find this tool from the link: OStress Tool

OStress command will be like:

ostress -S"TestServer" -Utestuser -P"password" -dTestDB -Q"INSERT INTO dbo.[TestTable_V1] ([Status]) VALUES ('CREATED');" -n200 -r10000 -q

I’m gonna clean wait stats after every load with;

DBCC SQLPERF('sys.dm_os_wait_stats', CLEAR);GO

We will make 10000 iterations with 200 threads to simulate a very high insert load.

At the standard table, we can see lots of pagelatch_ex waits.

If we look, top waits we can see there are 2 wait types after WRITELOG, which is expected for high volume insert load; (Top wait script)

It took 01.46 to complete 10000 iterations.

Let’s create the same table as V2 and use OPTIMIZE_FOR_SEQUENTIAL_KEY option ON for clustered index at this table. This option comes with SQL Server 2019 to solve the last page contention issue. Let’s see if it can help us with this load. We will create TestTable_V2 with OPTIMIZE_FOR_SEQUENTIAL_KEY option.

CREATE TABLE [dbo].[TestTable_V2]([Id] [bigint] IDENTITY(1,1) NOT NULL,[Status] [nvarchar](max) NOT NULL,[CreateDate] [datetime] NOT NULLCONSTRAINT [PK_TestTable_V2] PRIMARY KEY CLUSTERED([Id] ASC)WITH (OPTIMIZE_FOR_SEQUENTIAL_KEY = ON)) GOALTER TABLE [dbo].[TestTable_V2] ADD  CONSTRAINT [DF_TestTable_V2_CreateDate]  DEFAULT (sysdatetime()) FOR [CreateDate]GO

After the test, we can see BTREE_INSERT_FLOW_CONTROL waits exist active sessions, this is a special wait type for this configuration.

If we look at top waits, again we can see this wait type takes the lead but also there are PAGELATCH_EX waits existing;

It took 01.45 to complete 10000 iterations. We can say it resulted same as the standard table. Little disappointing.

Third, we will create the same table as V3 and we will put NON-CLUSTERED index on PK. We said before “this situation can happen on non-clustered indexes also because leaf-level pages will be ordered by the leading column of the index”. Let’s see if it’s true or not.

CREATE TABLE [dbo].[TestTable_V3]([Id] [bigint] IDENTITY(1,1) NOT NULL,[Status] [nvarchar](max) NOT NULL,[CreateDate] [datetime] NOT NULLCONSTRAINT [PK_TestTable_V3] PRIMARY KEY NONCLUSTERED([Id] ASC) ) GOALTER TABLE [dbo].[TestTable_V3] ADD  CONSTRAINT [DF_TestTable_V3_CreateDate]  DEFAULT (sysdatetime()) FOR [CreateDate]GO

Here, we can observe PAGELATCH_EX waits again;

At the top wait stats query, we can see similar results as the standard table;

It took 01.28 to complete 10000 iterations. Quicker but not optimal.

Next, we will try uniqueidentifier as PK and we will put a clustered index on it. UUIDs are not sequential. So, we assume concurrent inserts will not try to use the same page and we will not see PAGELATCH waits.

CREATE TABLE [dbo].[TestTable_V4]([Id] [uniqueidentifier] NOT NULL,[Status] [nvarchar](max) NOT NULL,[CreateDate] [datetime] NOT NULL,CONSTRAINT [PK_TestTable_V4] PRIMARY KEY CLUSTERED([Id] ASC))GOALTER TABLE [dbo].[TestTable_V4] ADD  DEFAULT (newid()) FOR [Id]GOALTER TABLE [dbo].[TestTable_V4] ADD  CONSTRAINT [DF_TestTable_V4_CreateDate]  DEFAULT (sysdatetime()) FOR [CreateDate]GO

At the top wait stats query, we don’t see any wait types about latches. Just WRITELOG wait.

It took 51 seconds to complete 10000 iterations. That is a good improvement in performance.

Next, we will create a computed column “HashValue” from Id. A hash value is generated as a modulo with Id and number of CPUs on the system. For our system, it is 16. We will create a composite PK, which has the leading key as HashValue and we will put a clustered index on it. Inserts will be spread out this way.

CREATE TABLE [dbo].[TestTable_V5]([Id] [bigint] IDENTITY(1,1) NOT NULL,[Status] [nvarchar](max) NOT NULL,[CreateDate] [datetime] NOT NULL,[HashValue]  AS (CONVERT([tinyint],abs([Id])%(16))) PERSISTED NOT NULL,CONSTRAINT [PK_TestTable_V5] PRIMARY KEY CLUSTERED([HashValue] ASC, [Id] ASC) ) GOALTER TABLE [dbo].[TestTable_V5] ADD  CONSTRAINT [DF_TestTable_V5_CreateDate]  DEFAULT (sysdatetime()) FOR [CreateDate]GO

At the top wait stats query, we can only see WRITELOG. No latch waits.

It took 50 seconds to complete 10000 iterations. This method’s result is best from a performance perspective.

As a final touch, let’s try this scenario with a memory-optimized table.

First, we will add the necessary file and filegroup for the table;

ALTER DATABASE TestDBADD FILEGROUP TestDB_mod CONTAINS MEMORY_OPTIMIZED_DATA;ALTER DATABASE TestDB ADD FILE (name='TestDB_mod1',filename='D:\DATA\TestDB') TO FILEGROUP TestDB_mod

And create a table with schema and data durability;

CREATE TABLE [dbo].[TestTable_V6]([Id] [bigint] IDENTITY(1,1) NOT NULL,[Status] [nvarchar](max) NOT NULL,[CreateDate] [datetime] NOT NULL,CONSTRAINT [PK_TestTable_V6] PRIMARY KEY NONCLUSTERED([Id] ASC))  WITH (MEMORY_OPTIMIZED = ON,DURABILITY = SCHEMA_AND_DATA)GOALTER TABLE [dbo].[TestTable_V6] ADD  CONSTRAINT [DF_TestTable_V6_CreateDate]  DEFAULT (sysdatetime()) FOR [CreateDate]GO

When we finish the test, at the top wait stats query, we can only see WRITELOG. No latch waits.

It took 54 seconds to execute 10000 iterations.

CONCLUSION

Microsoft’s solution for this problem does not perform well according to our tests.

To solve the last page contention problem, we have three options;

1-) Using UUID as PK and clustered index:

Using non-sequential unique identifiers as PK with clustered index can definitely help you with your PAGELATCH waits. You won’t see these wait types again. Also, we observed good performance, It took 51 seconds. 106 to 51 seconds. Huge improvement. Note that you have to change the application side also. You won’t be able to use an auto-increment Id for your records. If you still want to use it, you have to do without index on it. If you want to use clustered index power with selects you should change your app to use UUID for records.

2-) Using computed value as a leading column in your clustered index:

Well, this method gives the best performance according to our experiments. We didn’t see any PAGELATCH waits, and It was fast. Also, the main advantage of this method you can still use your clustered index for selects. You don’t have to change so many things. Only you can rewrite selects with Id. You should compute modulo of Id and send select SQL server with HashValue. By the way, you can use another column that doesn’t increment sequentially if your app suits it.

I have to say that, page splits can also occur in these structures. You should be aware of it. Because you are not writing to pages sequentially. It is a trade-off.

3-) Using memory-optimized table:

Also, this method can eliminate PAGELATCH waits. Its result is also similar to UUID and computed value methods. Performance was good. But, transforming a table with a heavy insert load, to a memory-optimized table can be risky. You should monitor carefully the row count, bucket count, memory footprint of the table. Also, you would need an archive mechanism to move old records from the table. You can maintain table size that way.

Thanks for reading.

Disclaimer: Codes and ways I’ve provided in this article may not be optimal codes or ways. If you’ve any suggestions, I would like to hear about them.

You can find me at LinkedIn: https://www.linkedin.com/in/alpaykurbal

Feel free to ask any questions and discuss technology topics.

Alpay

--

--