HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

How does sampling work when updating statistics?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
howsamplingupdatingstatisticsworkdoeswhen

Problem

I have several massive tables. I'd like to ensure their statistics are up to date via a weekly maintenance plan.

However, doing so is taking too much time.

If I specify

WITH SAMPLE 50 PERCENT


does SQL Server then sample:

  • the first 50% of pages



  • every other page



  • or some other strategy?



BOL is not clear on this.

Solution

For WITH SAMPLE 50 PERCENT it works as though for each data page in the table SQL Server flips a coin. If it lands heads then it reads all the rows on the page. If it lands tails then it reads none.

Tracing the UPDATE STATISTICS T WITH SAMPLE 50 PERCENT call in Profiler shows the following query is emitted

SELECT StatMan([SC0], [SB0000])
FROM   (SELECT TOP 100 PERCENT [SC0],
                               step_direction([SC0]) OVER (ORDER BY NULL) AS [SB0000]
        FROM   (SELECT [C] AS [SC0]
                FROM   [dbo].[T] TABLESAMPLE SYSTEM (5.000000e+001 PERCENT) 
                WITH (READUNCOMMITTED)) AS _MS_UPDSTATS_TBL_HELPER
        ORDER  BY [SC0],
                  [SB0000]) AS _MS_UPDSTATS_TBL
OPTION (MAXDOP 1)


With Plan

TABLESAMPLE SYSTEM (5.000000e+001 PERCENT) is responsible for the sampling and is documented here

TABLESAMPLE SYSTEM returns an approximate percentage of rows and
generates a random value for each physical 8-KB page in the table.
Based on the random value for a page and the percentage specified in
the query, a page is either included in the sample or excluded. Each
page that is included returns all rows in the sample result set.

The documentation also states

Although the plan shows a table scan is performed, only those pages
that are included in the result set are actually required to be read
from the data file.

The STATMAN call is to an internal aggregate function briefly described here

Code Snippets

SELECT StatMan([SC0], [SB0000])
FROM   (SELECT TOP 100 PERCENT [SC0],
                               step_direction([SC0]) OVER (ORDER BY NULL) AS [SB0000]
        FROM   (SELECT [C] AS [SC0]
                FROM   [dbo].[T] TABLESAMPLE SYSTEM (5.000000e+001 PERCENT) 
                WITH (READUNCOMMITTED)) AS _MS_UPDSTATS_TBL_HELPER
        ORDER  BY [SC0],
                  [SB0000]) AS _MS_UPDSTATS_TBL
OPTION (MAXDOP 1)

Context

StackExchange Database Administrators Q#34883, answer score: 18

Revisions (0)

No revisions yet.