patternsqlMinor
Is pre-sorting valuable when using table value constructor to insert multiple values into a table with clustered index?
Viewed 0 times
sortingclusteredvaluableinsertwithintovaluepreconstructorusing
Problem
This is related to this question: Efficient INSERT INTO a Table With Clustered Index
From that question it sounds like SQL Server will sometimes pre-sort the data when inserting multiple values into a table with a clustered index to reduce fragmentation. Will SQL Server also do this when I use a Table Value Constructor to insert multiple values in an insert statement?
For example, if I have a table
Will SQL Server pre-sort for me to ensure the data goes in efficiently without fragmentation or should I sort the data in my app before generating the SQL?
From that question it sounds like SQL Server will sometimes pre-sort the data when inserting multiple values into a table with a clustered index to reduce fragmentation. Will SQL Server also do this when I use a Table Value Constructor to insert multiple values in an insert statement?
For example, if I have a table
event_log that is clustered on a timestamp column and I insert data like this:INSERT INTO event_log (timestamp, data)
VALUES
('03:00:02', 'data string...'),
('03:00:01', 'data string...'),
('03:00:00', 'data string...')Will SQL Server pre-sort for me to ensure the data goes in efficiently without fragmentation or should I sort the data in my app before generating the SQL?
Solution
I've constructed a test-bed to see what happens:
The first column is the primary key, and as you can see the values are listed in random(ish) order. Listing the values in random order should make SQL Server either:
The
Once you run the above insert, you can check fragmentation like this:
Running this on my SQL Server 2012 Developer Edition instance shows average fragmentation of 90%, indicating SQL Server did not sort during the insert.
The moral of this particular story is likely to be, "when in doubt, sort, if it will be beneficial". Having said that, adding and
On non-production systems you can use trace flag 2332 as an option on the insert statement to "force" SQL Server to sort the input prior to inserting it. @PaulWhite has an interesting article, Optimizing T-SQL queries that change data covering that, and other details. Be aware, that trace flag is unsupported, and should NOT be used in production systems, since that might void your warranty. In a non-production system, for your own education, you can try adding this to the end of the
Once you have that appended to the insert, take a look at the plan, you'll see an explicit sort:
It would be great if Microsoft would make this a supported trace flag.
Paul White made me aware that SQL Server does automatically introduce a sort operator into the plan when it thinks one will be helpful. For the sample query above, if I run the insert with 250 items in the
USE tempdb;
CREATE TABLE dbo.TestSort
(
Sorted INT NOT NULL
CONSTRAINT PK_TestSort
PRIMARY KEY CLUSTERED
, SomeData VARCHAR(2048) NOT NULL
);
INSERT INTO dbo.TestSort (Sorted, SomeData)
VALUES (1797604285, CRYPT_GEN_RANDOM(1024))
, (1530768597, CRYPT_GEN_RANDOM(1024))
, (1274169954, CRYPT_GEN_RANDOM(1024))
, (-1972758125, CRYPT_GEN_RANDOM(1024))
, (1768931454, CRYPT_GEN_RANDOM(1024))
, (-1180422587, CRYPT_GEN_RANDOM(1024))
, (-1373873804, CRYPT_GEN_RANDOM(1024))
, (293442810, CRYPT_GEN_RANDOM(1024))
, (-2126229859, CRYPT_GEN_RANDOM(1024))
, (715871545, CRYPT_GEN_RANDOM(1024))
, (-1163940131, CRYPT_GEN_RANDOM(1024))
, (566332020, CRYPT_GEN_RANDOM(1024))
, (1880249597, CRYPT_GEN_RANDOM(1024))
, (-1213257849, CRYPT_GEN_RANDOM(1024))
, (-155893134, CRYPT_GEN_RANDOM(1024))
, (976883931, CRYPT_GEN_RANDOM(1024))
, (-1424958821, CRYPT_GEN_RANDOM(1024))
, (-279093766, CRYPT_GEN_RANDOM(1024))
, (-903956376, CRYPT_GEN_RANDOM(1024))
, (181119720, CRYPT_GEN_RANDOM(1024))
, (-422397654, CRYPT_GEN_RANDOM(1024))
, (-560438983, CRYPT_GEN_RANDOM(1024))
, (968519165, CRYPT_GEN_RANDOM(1024))
, (1820871210, CRYPT_GEN_RANDOM(1024))
, (-1348787729, CRYPT_GEN_RANDOM(1024))
, (-1869809700, CRYPT_GEN_RANDOM(1024))
, (423340320, CRYPT_GEN_RANDOM(1024))
, (125852107, CRYPT_GEN_RANDOM(1024))
, (-1690550622, CRYPT_GEN_RANDOM(1024))
, (570776311, CRYPT_GEN_RANDOM(1024))
, (2120766755, CRYPT_GEN_RANDOM(1024))
, (1123596784, CRYPT_GEN_RANDOM(1024))
, (496886282, CRYPT_GEN_RANDOM(1024))
, (-571192016, CRYPT_GEN_RANDOM(1024))
, (1036877128, CRYPT_GEN_RANDOM(1024))
, (1518056151, CRYPT_GEN_RANDOM(1024))
, (1617326587, CRYPT_GEN_RANDOM(1024))
, (410892484, CRYPT_GEN_RANDOM(1024))
, (1826927956, CRYPT_GEN_RANDOM(1024))
, (-1898916773, CRYPT_GEN_RANDOM(1024))
, (245592851, CRYPT_GEN_RANDOM(1024))
, (1826773413, CRYPT_GEN_RANDOM(1024))
, (1451000899, CRYPT_GEN_RANDOM(1024))
, (1234288293, CRYPT_GEN_RANDOM(1024))
, (1433618321, CRYPT_GEN_RANDOM(1024))
, (-1584291587, CRYPT_GEN_RANDOM(1024))
, (-554159323, CRYPT_GEN_RANDOM(1024))
, (-1478814392, CRYPT_GEN_RANDOM(1024))
, (1326124163, CRYPT_GEN_RANDOM(1024))
, (701812459, CRYPT_GEN_RANDOM(1024));The first column is the primary key, and as you can see the values are listed in random(ish) order. Listing the values in random order should make SQL Server either:
- Sort the data, pre-insert
- Not sort the data, resulting in a fragmented table.
The
CRYPT_GEN_RANDOM() function is used to generate 1024 bytes of random data per row, to allow this table to consume multiple pages, which in turn allows us to see the effects of fragmented inserts.Once you run the above insert, you can check fragmentation like this:
SELECT *
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('TestSort'), 1, 0, 'SAMPLED') ips;Running this on my SQL Server 2012 Developer Edition instance shows average fragmentation of 90%, indicating SQL Server did not sort during the insert.
The moral of this particular story is likely to be, "when in doubt, sort, if it will be beneficial". Having said that, adding and
ORDER BY clause to an insert statement does not guarantee the inserts will occur in that order. Consider what happens if the insert goes parallel, as an example.On non-production systems you can use trace flag 2332 as an option on the insert statement to "force" SQL Server to sort the input prior to inserting it. @PaulWhite has an interesting article, Optimizing T-SQL queries that change data covering that, and other details. Be aware, that trace flag is unsupported, and should NOT be used in production systems, since that might void your warranty. In a non-production system, for your own education, you can try adding this to the end of the
INSERT statement:OPTION (QUERYTRACEON 2332);Once you have that appended to the insert, take a look at the plan, you'll see an explicit sort:
It would be great if Microsoft would make this a supported trace flag.
Paul White made me aware that SQL Server does automatically introduce a sort operator into the plan when it thinks one will be helpful. For the sample query above, if I run the insert with 250 items in the
values clause, no sort is implemented automatically. However, at 251 items, SQL Server automatically sorts the values prior to the insert. Why the cutoff is 250/251 rows remains a mystery to me, other than it seems to be hard-coded. If I reduce the size of the data inserted in the SomeData column to just one byte, the cutoff is still 250/251 rows, even though the size of the table in both cases is just a single page. Interestingly, looking at the insert with SET STATISTICS IO, TIME ON; shows the inserts with a single byte SomeData value take twiceCode Snippets
USE tempdb;
CREATE TABLE dbo.TestSort
(
Sorted INT NOT NULL
CONSTRAINT PK_TestSort
PRIMARY KEY CLUSTERED
, SomeData VARCHAR(2048) NOT NULL
);
INSERT INTO dbo.TestSort (Sorted, SomeData)
VALUES (1797604285, CRYPT_GEN_RANDOM(1024))
, (1530768597, CRYPT_GEN_RANDOM(1024))
, (1274169954, CRYPT_GEN_RANDOM(1024))
, (-1972758125, CRYPT_GEN_RANDOM(1024))
, (1768931454, CRYPT_GEN_RANDOM(1024))
, (-1180422587, CRYPT_GEN_RANDOM(1024))
, (-1373873804, CRYPT_GEN_RANDOM(1024))
, (293442810, CRYPT_GEN_RANDOM(1024))
, (-2126229859, CRYPT_GEN_RANDOM(1024))
, (715871545, CRYPT_GEN_RANDOM(1024))
, (-1163940131, CRYPT_GEN_RANDOM(1024))
, (566332020, CRYPT_GEN_RANDOM(1024))
, (1880249597, CRYPT_GEN_RANDOM(1024))
, (-1213257849, CRYPT_GEN_RANDOM(1024))
, (-155893134, CRYPT_GEN_RANDOM(1024))
, (976883931, CRYPT_GEN_RANDOM(1024))
, (-1424958821, CRYPT_GEN_RANDOM(1024))
, (-279093766, CRYPT_GEN_RANDOM(1024))
, (-903956376, CRYPT_GEN_RANDOM(1024))
, (181119720, CRYPT_GEN_RANDOM(1024))
, (-422397654, CRYPT_GEN_RANDOM(1024))
, (-560438983, CRYPT_GEN_RANDOM(1024))
, (968519165, CRYPT_GEN_RANDOM(1024))
, (1820871210, CRYPT_GEN_RANDOM(1024))
, (-1348787729, CRYPT_GEN_RANDOM(1024))
, (-1869809700, CRYPT_GEN_RANDOM(1024))
, (423340320, CRYPT_GEN_RANDOM(1024))
, (125852107, CRYPT_GEN_RANDOM(1024))
, (-1690550622, CRYPT_GEN_RANDOM(1024))
, (570776311, CRYPT_GEN_RANDOM(1024))
, (2120766755, CRYPT_GEN_RANDOM(1024))
, (1123596784, CRYPT_GEN_RANDOM(1024))
, (496886282, CRYPT_GEN_RANDOM(1024))
, (-571192016, CRYPT_GEN_RANDOM(1024))
, (1036877128, CRYPT_GEN_RANDOM(1024))
, (1518056151, CRYPT_GEN_RANDOM(1024))
, (1617326587, CRYPT_GEN_RANDOM(1024))
, (410892484, CRYPT_GEN_RANDOM(1024))
, (1826927956, CRYPT_GEN_RANDOM(1024))
, (-1898916773, CRYPT_GEN_RANDOM(1024))
, (245592851, CRYPT_GEN_RANDOM(1024))
, (1826773413, CRYPT_GEN_RANDOM(1024))
, (1451000899, CRYPT_GEN_RANDOM(1024))
, (1234288293, CRYPT_GEN_RANDOM(1024))
, (1433618321, CRYPT_GEN_RANDOM(1024))
, (-1584291587, CRYPT_GEN_RANDOM(1024))
, (-554159323, CRYPT_GEN_RANDOM(1024))
, (-1478814392, CRYPT_GEN_RANDOM(1024))
, (1326124163, CRYPT_GEN_RANDOM(1024))
, (701812459, CRYPT_GEN_RANDOM(1024));SELECT *
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('TestSort'), 1, 0, 'SAMPLED') ips;OPTION (QUERYTRACEON 2332);;WITH s AS (
SELECT v.Item
FROM (VALUES (0), (1), (2), (3), (4), (5), (6), (7), (8), (9)) v(Item)
)
, v AS (
SELECT Num = CONVERT(int, CRYPT_GEN_RANDOM(10), 0)
)
, o AS (
SELECT v.Num
, rn = ROW_NUMBER() OVER (PARTITION BY v.Num ORDER BY NEWID())
FROM s s1
CROSS JOIN s s2
CROSS JOIN s s3
CROSS JOIN v
)
SELECT TOP(50) ', ('
+ REPLACE(CONVERT(varchar(11), o.Num), '*', '0')
+ ', CRYPT_GEN_RANDOM(1024))'
FROM o
WHERE rn = 1
ORDER BY NEWID();Context
StackExchange Database Administrators Q#160200, answer score: 8
Revisions (0)
No revisions yet.