snippetsqlMinor
SQL Server: Create New GUID Values for Existing Index
Viewed 0 times
guidvaluesnewcreateexistingsqlforserverindex
Problem
I have an existing table with about 6 million rows. The relevant columns are:
Read/write on this table by
The existing rows were all created using
Given that there are already millions of rows in the table, is there any benefit to changing the GUID generation to sequential using
Going forward it would generate better data, but since it's unknown where within the index the new sequential GUID series would land, would it just screw up the index faster? Would it be better to leave the GUID generation non-sequential so that the remaining page space in the index fills more evenly?
As far as the growth rate of the table, it represents about 3 years of data.
Where 2014 is halfway through the year as of this question. There were some code changes at some point that reduced the rate of inserts, so I believe 2013 would be a typical annual row count.
ID int not null PK
Key uniqueidentifier not nullRead/write on this table by
Key lookup is probably something like 100 to 1.The existing rows were all created using
newid() so they are not in sequence. An index exists for Key.CREATE NONCLUSTERED INDEX [idx_Robert] ON [dbo].[Aleksander]
(
[Key] ASC
) WITH (PAD_INDEX = OFF
, STATISTICS_NORECOMPUTE = OFF
, SORT_IN_TEMPDB = OFF
, DROP_EXISTING = OFF
, ONLINE = OFF
, ALLOW_ROW_LOCKS = ON
, ALLOW_PAGE_LOCKS = ON
, FILLFACTOR = 80
) ON [PRIMARY]Given that there are already millions of rows in the table, is there any benefit to changing the GUID generation to sequential using
newsequentialid()?Going forward it would generate better data, but since it's unknown where within the index the new sequential GUID series would land, would it just screw up the index faster? Would it be better to leave the GUID generation non-sequential so that the remaining page space in the index fills more evenly?
As far as the growth rate of the table, it represents about 3 years of data.
Year NumRows
2012 3962660
2013 1661189
2014 711241Where 2014 is halfway through the year as of this question. There were some code changes at some point that reduced the rate of inserts, so I believe 2013 would be a typical annual row count.
Solution
You may be able to reduce index rebuild/reorg times by moving to
You are currently inserting around 4,500 rows into the table per day. The table has around 6.3 million rows at present. That's around 0.7% of the table per day. Assuming every insert on the table results in an index-page split, this will result in 9,000 pages per day written to the storage system. This will create some unnecessary I/O pressure both for inserts into the index, and for rebuild/reorgs. Switching the
If you switch to
This change would make index scans less likely. Paul White has an excellent article about this at https://sqlkiwi.blogspot.com/2011/02/seeking-without-indexes.html
Since you haven't given any indication of the type of queries you run against this table, I will assume you are at some points scanning the entire index nothing about your situation. If your data is truly stored on a single spinning disk, then having your index non-fragmented should reduce the time needed for queries to return information.
Having said that, you could likely get a whole lot more for your money (time is money) by improving the IO subsystem so index rebuild/reorgs are unnecessary. See http://www.brentozar.com/archive/2012/08/sql-server-index-fragmentation/ .
If you are interested in measuring the effects of the index settings you are using for the
(You may want to adjust the parameters passed to only show the index on the
Once the
If you did decide to use
Since you are only inserting new rows at a rate of about 0.07% of the total table per day, it may take a very long time before the system automatically triggers a histogram update on the statistics for the table involved. You could use the
NEWSEQUENTIALID().You are currently inserting around 4,500 rows into the table per day. The table has around 6.3 million rows at present. That's around 0.7% of the table per day. Assuming every insert on the table results in an index-page split, this will result in 9,000 pages per day written to the storage system. This will create some unnecessary I/O pressure both for inserts into the index, and for rebuild/reorgs. Switching the
Key field to sequential ID generation will vastly reduce this I/O pattern.If you switch to
NEWSEQUENTIALID(), the code that performs INSERTs on the table will need to be sensitive the possibility that it will attempt to insert a duplicate Key. This is a remote possibility, however it is still a possibility, and is the same for both NEWID() and NEWSEQUENTIALID() - since nothing is truly random that is generated by a computer. Queries against the index could gain some benefit by making the index unique, as in:CREATE UNIQUE NONCLUSTERED INDEX [idx_Robert] ON [dbo].[Aleksander]
(
[Key] ASC
) WITH (
PAD_INDEX = OFF
, STATISTICS_NORECOMPUTE = OFF
, SORT_IN_TEMPDB = OFF
, DROP_EXISTING = OFF
, ONLINE = OFF
, ALLOW_ROW_LOCKS = ON
, ALLOW_PAGE_LOCKS = ON
, FILLFACTOR = 80
) ON [PRIMARY];This change would make index scans less likely. Paul White has an excellent article about this at https://sqlkiwi.blogspot.com/2011/02/seeking-without-indexes.html
Since you haven't given any indication of the type of queries you run against this table, I will assume you are at some points scanning the entire index nothing about your situation. If your data is truly stored on a single spinning disk, then having your index non-fragmented should reduce the time needed for queries to return information.
Having said that, you could likely get a whole lot more for your money (time is money) by improving the IO subsystem so index rebuild/reorgs are unnecessary. See http://www.brentozar.com/archive/2012/08/sql-server-index-fragmentation/ .
If you are interested in measuring the effects of the index settings you are using for the
Key column, you could monitor page free percentage (among other things) for pages used by the index by using the system dmv, sys.dm_exec_index_physical_stats. For example:SELECT o.name, ps.avg_page_space_used_in_percent, ps.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('KeyTable','Table'), NULL, NULL, 'DETAILED') ps
INNER JOIN sys.objects o ON ps.object_id = o.object_id;(You may want to adjust the parameters passed to only show the index on the
Key column.)Once the
avg_page_space_used_in_percent crosses some boundary value such as 95% (or more), you would then rebuild the index, which both defragments it, and with an appropriate FILL_FACTOR setting, leaves some space free on each page for new rows with randomly generated NEWID() values.If you did decide to use
NEWSEQUENTIALID() your index could use a FILL_FACTOR of 100% since index pages would not longer be split on a frequent basis. This has the benefit of reducing the number of pages required to hold the index by 20% (against your current settings), thereby reducing the IO required to rebuild the index when that does become necessary.Since you are only inserting new rows at a rate of about 0.07% of the total table per day, it may take a very long time before the system automatically triggers a histogram update on the statistics for the table involved. You could use the
STATS_DATE() function to ensure the statistics are never more than x days old. Up-to-date statistics are vital for the query engine to produce optimal query plans.Code Snippets
CREATE UNIQUE NONCLUSTERED INDEX [idx_Robert] ON [dbo].[Aleksander]
(
[Key] ASC
) WITH (
PAD_INDEX = OFF
, STATISTICS_NORECOMPUTE = OFF
, SORT_IN_TEMPDB = OFF
, DROP_EXISTING = OFF
, ONLINE = OFF
, ALLOW_ROW_LOCKS = ON
, ALLOW_PAGE_LOCKS = ON
, FILLFACTOR = 80
) ON [PRIMARY];SELECT o.name, ps.avg_page_space_used_in_percent, ps.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('KeyTable','Table'), NULL, NULL, 'DETAILED') ps
INNER JOIN sys.objects o ON ps.object_id = o.object_id;Context
StackExchange Database Administrators Q#72684, answer score: 5
Revisions (0)
No revisions yet.