patternsqlModerate
How does sampling work when updating statistics?
Viewed 0 times
howsamplingupdatingstatisticsworkdoeswhen
Problem
I have several massive tables. I'd like to ensure their statistics are up to date via a weekly maintenance plan.
However, doing so is taking too much time.
If I specify
does SQL Server then sample:
BOL is not clear on this.
However, doing so is taking too much time.
If I specify
WITH SAMPLE 50 PERCENTdoes SQL Server then sample:
- the first 50% of pages
- every other page
- or some other strategy?
BOL is not clear on this.
Solution
For
Tracing the
With Plan
generates a random value for each physical 8-KB page in the table.
Based on the random value for a page and the percentage specified in
the query, a page is either included in the sample or excluded. Each
page that is included returns all rows in the sample result set.
The documentation also states
Although the plan shows a table scan is performed, only those pages
that are included in the result set are actually required to be read
from the data file.
The
WITH SAMPLE 50 PERCENT it works as though for each data page in the table SQL Server flips a coin. If it lands heads then it reads all the rows on the page. If it lands tails then it reads none.Tracing the
UPDATE STATISTICS T WITH SAMPLE 50 PERCENT call in Profiler shows the following query is emittedSELECT StatMan([SC0], [SB0000])
FROM (SELECT TOP 100 PERCENT [SC0],
step_direction([SC0]) OVER (ORDER BY NULL) AS [SB0000]
FROM (SELECT [C] AS [SC0]
FROM [dbo].[T] TABLESAMPLE SYSTEM (5.000000e+001 PERCENT)
WITH (READUNCOMMITTED)) AS _MS_UPDSTATS_TBL_HELPER
ORDER BY [SC0],
[SB0000]) AS _MS_UPDSTATS_TBL
OPTION (MAXDOP 1)With Plan
TABLESAMPLE SYSTEM (5.000000e+001 PERCENT) is responsible for the sampling and is documented hereTABLESAMPLE SYSTEM returns an approximate percentage of rows andgenerates a random value for each physical 8-KB page in the table.
Based on the random value for a page and the percentage specified in
the query, a page is either included in the sample or excluded. Each
page that is included returns all rows in the sample result set.
The documentation also states
Although the plan shows a table scan is performed, only those pages
that are included in the result set are actually required to be read
from the data file.
The
STATMAN call is to an internal aggregate function briefly described hereCode Snippets
SELECT StatMan([SC0], [SB0000])
FROM (SELECT TOP 100 PERCENT [SC0],
step_direction([SC0]) OVER (ORDER BY NULL) AS [SB0000]
FROM (SELECT [C] AS [SC0]
FROM [dbo].[T] TABLESAMPLE SYSTEM (5.000000e+001 PERCENT)
WITH (READUNCOMMITTED)) AS _MS_UPDSTATS_TBL_HELPER
ORDER BY [SC0],
[SB0000]) AS _MS_UPDSTATS_TBL
OPTION (MAXDOP 1)Context
StackExchange Database Administrators Q#34883, answer score: 18
Revisions (0)
No revisions yet.