HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

SQL Server: Create New GUID Values for Existing Index

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
guidvaluesnewcreateexistingsqlforserverindex

Problem

I have an existing table with about 6 million rows. The relevant columns are:

ID int not null PK
Key uniqueidentifier not null


Read/write on this table by Key lookup is probably something like 100 to 1.

The existing rows were all created using newid() so they are not in sequence. An index exists for Key.

CREATE NONCLUSTERED INDEX [idx_Robert] ON [dbo].[Aleksander]
(
    [Key] ASC
) WITH (PAD_INDEX = OFF
    , STATISTICS_NORECOMPUTE = OFF
    , SORT_IN_TEMPDB = OFF
    , DROP_EXISTING = OFF
    , ONLINE = OFF
    , ALLOW_ROW_LOCKS = ON
    , ALLOW_PAGE_LOCKS = ON
    , FILLFACTOR = 80
    ) ON [PRIMARY]


Given that there are already millions of rows in the table, is there any benefit to changing the GUID generation to sequential using newsequentialid()?

Going forward it would generate better data, but since it's unknown where within the index the new sequential GUID series would land, would it just screw up the index faster? Would it be better to leave the GUID generation non-sequential so that the remaining page space in the index fills more evenly?

As far as the growth rate of the table, it represents about 3 years of data.

Year    NumRows
2012    3962660
2013    1661189
2014    711241


Where 2014 is halfway through the year as of this question. There were some code changes at some point that reduced the rate of inserts, so I believe 2013 would be a typical annual row count.

Solution

You may be able to reduce index rebuild/reorg times by moving to NEWSEQUENTIALID().

You are currently inserting around 4,500 rows into the table per day. The table has around 6.3 million rows at present. That's around 0.7% of the table per day. Assuming every insert on the table results in an index-page split, this will result in 9,000 pages per day written to the storage system. This will create some unnecessary I/O pressure both for inserts into the index, and for rebuild/reorgs. Switching the Key field to sequential ID generation will vastly reduce this I/O pattern.

If you switch to NEWSEQUENTIALID(), the code that performs INSERTs on the table will need to be sensitive the possibility that it will attempt to insert a duplicate Key. This is a remote possibility, however it is still a possibility, and is the same for both NEWID() and NEWSEQUENTIALID() - since nothing is truly random that is generated by a computer. Queries against the index could gain some benefit by making the index unique, as in:

CREATE UNIQUE NONCLUSTERED INDEX [idx_Robert] ON [dbo].[Aleksander]
(
    [Key] ASC
) WITH (
    PAD_INDEX = OFF
    , STATISTICS_NORECOMPUTE = OFF
    , SORT_IN_TEMPDB = OFF
    , DROP_EXISTING = OFF
    , ONLINE = OFF
    , ALLOW_ROW_LOCKS = ON
    , ALLOW_PAGE_LOCKS = ON
    , FILLFACTOR = 80
    ) ON [PRIMARY];


This change would make index scans less likely. Paul White has an excellent article about this at https://sqlkiwi.blogspot.com/2011/02/seeking-without-indexes.html

Since you haven't given any indication of the type of queries you run against this table, I will assume you are at some points scanning the entire index nothing about your situation. If your data is truly stored on a single spinning disk, then having your index non-fragmented should reduce the time needed for queries to return information.

Having said that, you could likely get a whole lot more for your money (time is money) by improving the IO subsystem so index rebuild/reorgs are unnecessary. See http://www.brentozar.com/archive/2012/08/sql-server-index-fragmentation/ .

If you are interested in measuring the effects of the index settings you are using for the Key column, you could monitor page free percentage (among other things) for pages used by the index by using the system dmv, sys.dm_exec_index_physical_stats. For example:

SELECT o.name, ps.avg_page_space_used_in_percent, ps.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('KeyTable','Table'), NULL, NULL, 'DETAILED') ps
    INNER JOIN sys.objects o ON ps.object_id = o.object_id;


(You may want to adjust the parameters passed to only show the index on the Key column.)

Once the avg_page_space_used_in_percent crosses some boundary value such as 95% (or more), you would then rebuild the index, which both defragments it, and with an appropriate FILL_FACTOR setting, leaves some space free on each page for new rows with randomly generated NEWID() values.

If you did decide to use NEWSEQUENTIALID() your index could use a FILL_FACTOR of 100% since index pages would not longer be split on a frequent basis. This has the benefit of reducing the number of pages required to hold the index by 20% (against your current settings), thereby reducing the IO required to rebuild the index when that does become necessary.

Since you are only inserting new rows at a rate of about 0.07% of the total table per day, it may take a very long time before the system automatically triggers a histogram update on the statistics for the table involved. You could use the STATS_DATE() function to ensure the statistics are never more than x days old. Up-to-date statistics are vital for the query engine to produce optimal query plans.

Code Snippets

CREATE UNIQUE NONCLUSTERED INDEX [idx_Robert] ON [dbo].[Aleksander]
(
    [Key] ASC
) WITH (
    PAD_INDEX = OFF
    , STATISTICS_NORECOMPUTE = OFF
    , SORT_IN_TEMPDB = OFF
    , DROP_EXISTING = OFF
    , ONLINE = OFF
    , ALLOW_ROW_LOCKS = ON
    , ALLOW_PAGE_LOCKS = ON
    , FILLFACTOR = 80
    ) ON [PRIMARY];
SELECT o.name, ps.avg_page_space_used_in_percent, ps.avg_fragmentation_in_percent
FROM sys.dm_db_index_physical_stats(DB_ID(),OBJECT_ID('KeyTable','Table'), NULL, NULL, 'DETAILED') ps
    INNER JOIN sys.objects o ON ps.object_id = o.object_id;

Context

StackExchange Database Administrators Q#72684, answer score: 5

Revisions (0)

No revisions yet.