HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Statistics disappear after incremental update

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
afterupdatestatisticsdisappearincremental

Problem

We have a large partitioned SQL Server database utilizing incremental statistics. All of the indexes are partitioned aligned. When we try to rebuild a partition online by partition all of the statistics disappear after the index is rebuilt.

Below is a script to replicate the issue in SQL Server 2014 with the AdventureWorks2014 database.

```
--Example against AdventureWorks2014 Database

CREATE PARTITION FUNCTION TransactionRangePF1 (DATETIME)
AS RANGE RIGHT FOR VALUES
(
'20130501', '20130601', '20130701', '20130801',
'20130901', '20131001', '20131101', '20131201',
'20140101', '20140201', '20140301'
);
GO

CREATE PARTITION SCHEME TransactionsPS1 AS PARTITION TransactionRangePF1 TO
(
[PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY],
[PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY], [PRIMARY],
[PRIMARY], [PRIMARY], [PRIMARY]
);
GO

CREATE TABLE dbo.TransactionHistory
(
TransactionID INT NOT NULL, -- not bothering with IDENTITY here
ProductID INT NOT NULL,
ReferenceOrderID INT NOT NULL,
ReferenceOrderLineID INT NOT NULL DEFAULT (0),
TransactionDate DATETIME NOT NULL DEFAULT (GETDATE()),
TransactionType NCHAR(1) NOT NULL,
Quantity INT NOT NULL,
ActualCost MONEY NOT NULL,
ModifiedDate DATETIME NOT NULL DEFAULT (GETDATE()),
CONSTRAINT CK_TransactionType
CHECK (UPPER(TransactionType) IN (N'W', N'S', N'P'))
)
ON TransactionsPS1 (TransactionDate);

INSERT INTO dbo.TransactionHistory
SELECT * FROM Production.TransactionHistory
-- SELECT * FROM sys.partitions
-- WHERE object_id = OBJECT_ID('dbo.TransactionHistory');

CREATE NONCLUSTERED INDEX IDX_ProductId ON dbo.TransactionHistory (ProductId)
WITH (DATA_COMPRESSION = ROW, STATISTICS_INCREMENTAL=ON)
ON TransactionsPS1 (TransactionDate)

DBCC SHOW_STATISTICS('dbo.TransactionHistory', IDX_ProductId);
PRINT 'Stats are avialable'

ALTER INDEX [IDX_ProductId] ON [dbo].[TransactionHist

Solution

Not sure if it's a bug, per se but it's definitely an interesting occurrence. Online partition rebuilds are new in SQL Server 2014 so there may be some internals to sort through with this.

Here's my best explanation for you. Incremental statistics absolutely require that all partitions be sampled at the same rate so that when the engine merges the stats pages it can be confident that the sampled distribution is comparable. REBUILD necessarily samples data at a 100% sample rate. There's no guarantee that the 100% sample rate on partition 9 is always going to be the exact sample rate of the rest of the partitions. Because of this, it appears as though the engine cannot merge the samples and you end up with an empty stats blob. However, the statistics object is still there:

select 
    check_time = sysdatetime(),                         
    schema_name = sh.name,
    table_name = t.name,
    stat_name = s.name,
    index_name = i.name,
    stats_column = index_col(quotename(sh.name)+'.'+quotename(t.name),s.stats_id,1),
    s.stats_id,
    s.has_filter,                       
    s.is_incremental,
    s.auto_created,
    sp.last_updated,    
    sp.rows,
    sp.rows_sampled,                        
    sp.unfiltered_rows,
    modification_counter 
from sys.stats s 
join sys.tables t 
    on s.object_id = t.object_id
join sys.schemas sh
    on t.schema_id = sh.schema_id
left join sys.indexes i 
    on s.object_id = i.object_id
    and s.name = i.name
outer apply sys.dm_db_stats_properties(s.object_id, s.stats_id) sp
where t.name = 'TransactionHistory' and sh.name = 'dbo'


You can fill the blob through any number of means:

UPDATE STATISTICS dbo.TransactionHistory (IDX_ProductId) WITH RESAMPLE;

or

UPDATE STATISTICS dbo.TransactionHistory (IDX_ProductId) WITH RESAMPLE ON PARTITIONS (9);

or you can wait for AutoStats to update on the first compilation of a query plan using that object:

-- look at my creative query
select * 
from dbo.TransactionHistory
where TransactionDate = '20140101';


Having said all of that, this enlightening post by Erin Stellato highlights what has come to be perceived as a major deficiency of incremental stats. Their partition-level data are not are not used by the optimizer in query plan generation, reducing the presumed benefit of incremental statistics. What, then, is the current benefit of incremental statistics? I'd submit that their primary utility is in an ability to sample large tables more consistently at a higher rate than with traditional statistics.

Using your example, here's how things look:

set statistics time on;

update statistics dbo.TransactionHistory(IDX_ProductId)
with fullscan;

--SQL Server Execution Times:
--  CPU time = 94 ms,  elapsed time = 131 ms.

update statistics dbo.TransactionHistory(IDX_ProductId)
with resample on partitions(2);

 --SQL Server Execution Times:
 --  CPU time = 0 ms,  elapsed time = 5 ms.

drop index IDX_ProductId On dbo.TransactionHistory;

CREATE NONCLUSTERED INDEX IDX_ProductId ON dbo.TransactionHistory (ProductId) 
  WITH (DATA_COMPRESSION = ROW)  
  ON [PRIMARY]

update statistics dbo.TransactionHistory(IDX_ProductId)
with fullscan;

 --SQL Server Execution Times:
 --  CPU time = 76 ms,  elapsed time = 66 ms.


A fullscan statistics update on the incremental statistic costs 131 ms. A fullscan statistics update on the non partition-aligned statistic costs 66 ms. The non-aligned statistic is slower most likely due to overhead incurred with merging the individual statistics pages back into the main histogram. However, using the partition-aligned statistic object, we can update one partition and merge it back into the main histogram blob in 5 ms. So at this point the administrator with the incremental statistic is faced with a decision. They can decrease their overall statistics maintenance time by only updating partitions would traditionally need to be updated, or they can experiment with higher sample rates such that they potentially get more rows sampled in the same period of time as their previous maintenance timeframe. The former allows breathing room in the maintenance window, the latter might push statistics on a very large table to a place where queries get better plans based on more accurate statistics. This is not a guarantee and your mileage may vary.

The reader can see that 66 ms is not a painful statistics update time on this table so I tried setting up a test on the stackexchange data set. There are 6,418,608 posts (excluding StackOverflow posts and all posts from 2012 - a data error on my part) in the recent dump that I downloaded.

I've partitioned the data by [CreationDate] because ... demo.

Here are some timings for some pretty standard scenarios (100% - index rebuild, default - stats auto-update or UPDATE STATISTICS without a specified sample rate:

  • Create Non-Incremental Statistic with Fullscan: CPU time = 23500 ms, elapsed time = 22521 ms.



  • Create Incremental Statistic WI

Code Snippets

select 
    check_time = sysdatetime(),                         
    schema_name = sh.name,
    table_name = t.name,
    stat_name = s.name,
    index_name = i.name,
    stats_column = index_col(quotename(sh.name)+'.'+quotename(t.name),s.stats_id,1),
    s.stats_id,
    s.has_filter,                       
    s.is_incremental,
    s.auto_created,
    sp.last_updated,    
    sp.rows,
    sp.rows_sampled,                        
    sp.unfiltered_rows,
    modification_counter 
from sys.stats s 
join sys.tables t 
    on s.object_id = t.object_id
join sys.schemas sh
    on t.schema_id = sh.schema_id
left join sys.indexes i 
    on s.object_id = i.object_id
    and s.name = i.name
outer apply sys.dm_db_stats_properties(s.object_id, s.stats_id) sp
where t.name = 'TransactionHistory' and sh.name = 'dbo'
-- look at my creative query
select * 
from dbo.TransactionHistory
where TransactionDate = '20140101';
set statistics time on;

update statistics dbo.TransactionHistory(IDX_ProductId)
with fullscan;

--SQL Server Execution Times:
--  CPU time = 94 ms,  elapsed time = 131 ms.


update statistics dbo.TransactionHistory(IDX_ProductId)
with resample on partitions(2);

 --SQL Server Execution Times:
 --  CPU time = 0 ms,  elapsed time = 5 ms.

drop index IDX_ProductId On dbo.TransactionHistory;

CREATE NONCLUSTERED INDEX IDX_ProductId ON dbo.TransactionHistory (ProductId) 
  WITH (DATA_COMPRESSION = ROW)  
  ON [PRIMARY]

update statistics dbo.TransactionHistory(IDX_ProductId)
with fullscan;

 --SQL Server Execution Times:
 --  CPU time = 76 ms,  elapsed time = 66 ms.
select 
    sysdatetime(),                          
    schema_name = sh.name,
    table_name = t.name,
    stat_name = s.name,
    index_name = i.name,
    leading_column = index_col(quotename(sh.name)+'.'+quotename(t.name),s.stats_id,1),
    s.stats_id,
    parition_number = isnull(sp.partition_number,1),
    s.has_filter,                       
    s.is_incremental,
    s.auto_created,
    sp.last_updated,    
    sp.rows,
    sp.rows_sampled,                        
    sp.unfiltered_rows,
    modification_counter = coalesce(sp.modification_counter, n1.modification_counter) 
from sys.stats s 
join sys.tables t 
    on s.object_id = t.object_id
join sys.schemas sh
    on t.schema_id = sh.schema_id
left join sys.indexes i 
    on s.object_id = i.object_id
        and s.name = i.name
cross apply sys.dm_db_stats_properties_internal(s.object_id, s.stats_id) sp
outer apply sys.dm_db_stats_properties_internal(s.object_id, s.stats_id) n1
where n1.node_id = 1
    and (
            (is_incremental = 0)
               or
            (is_incremental = 1 and sp.partition_number is not null)
         )
    and t.name = 'Posts'
    and s.name like 'st_posts%'
order by s.stats_id,isnull(sp.partition_number,1)

Context

StackExchange Database Administrators Q#111590, answer score: 18

Revisions (0)

No revisions yet.