HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

How often should you update your stats?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
updatehowyouyouroftenstatsshould

Problem

How often should you update your stats?
What is "too seldom"?
How often is "too often"?

The answer is "it depends" on your database, users, data, etc.

So I've tried to log what our stats look like over time, in two tables.
Here they are:

DROP TABLE /*IF EXISTS */ dbo.dm_db_stats_histogram
DROP TABLE /*IF EXISTS */ dbo.dm_db_stats_properties
go
CREATE TABLE dbo.dm_db_stats_properties(
  dm_db_stats_propertiesID INT IDENTITY(1,1) NOT NULL constraint PK_dm_db_stats_properties PRIMARY KEY CLUSTERED,
  DatabaseId INT NOT NULL,
  object_id int NOT NULL,
  stats_id  int NOT NULL,
  last_updated  DATETIME2 NOT NULL,
  rows  BIGINT NOT NULL,
  rows_sampled  BIGINT NOT NULL,
  steps int NOT NULL,
  unfiltered_rows   BIGINT NOT NULL,
  modification_counter  BIGINT NOT NULL,
  persisted_sample_percent  FLOAT  NULL
  , SampleDate DATETIME2 NOT NULL CONSTRAINT df_dm_db_stats_properties_SampleDate DEFAULT SYSUTCDATETIME()
)
GO
ALTER TABLE dbo.dm_db_stats_properties ADD StatsName NVARCHAR(128) NOT NULL CONSTRAINT df_dm_db_stats_properties_StatsName DEFAULT ('')

GO
CREATE TABLE dbo.dm_db_stats_histogram(
  dm_db_stats_histogramID INT IDENTITY(1,1) NOT NULL constraint PK_dm_db_stats_histogram PRIMARY KEY CLUSTERED,
  dm_db_stats_propertiesID INT NOT NULL, 
  object_id int NOT NULL,
  stats_id  int NOT NULL,
  step_number   int NOT NULL,
  range_high_key    sql_variant NOT NULL,
  range_rows    real    NOT NULL,
  equal_rows    real    NOT NULL,
  distinct_range_rows   bigint  NOT NULL,
  average_range_rows    REAL NOT NULL
)
go
ALTER TABLE dbo.dm_db_stats_histogram ADD CONSTRAINT fk_dm_db_stats_properties FOREIGN KEY(dm_db_stats_propertiesID) REFERENCES dbo.dm_db_stats_properties(dm_db_stats_propertiesID)
ALTER TABLE dbo.dm_db_stats_histogram ALTER COLUMN range_high_key SQL_VARIANT NULL
GO


And here is the code I use to log our stats:

```
SET NOCOUNT ON
BEGIN TRY
DROP TABLE #Stat_Header
END TRY
BEGIN CATCH
END CATCH
CREATE TABLE #Stat_Header (Name sysname, Updated DATE

Solution

b) which tables, index, columns that benefit from a WITH FULLSCAN command?

From my point of view you aren't gathering the right data to answer that question. If you're looking for improvements that you can make just by analyzing statistics on the database, I can only think of two query performance issues that can be caused by using sampled statistics instead of FULLSCAN:

-
The density is off by at least an order of magnitude.

Some data distributions aren't a good fit for some of the assumptions that SQL Server makes when turning sampled data into a histogram. In those situations you can end up with density which is off by 10X, 100X, or even more. That can cause performance issues for queries that use the density vector in the statistics object.

You can search for possible issues by saving off density information for all sampled statistics, gathering statistics on all relevant columns with FULLSCAN, and comparing the densities between the two result sets. Anything which is too inaccurate is a candidate for seeing a benefit from gathering stats in full.

-
The query is vulnerable to the ascending key problem.

You have SQL Server 2008 listed as a tag so this still might be relevant to you. Consider a column that stores the datetime when a row was inserted. If you have queries that filter on that column looking for very recent data they might be searching for data outside of the histogram. With the legacy CE you can end up with very low cardinality estimates with can lead to query performance issues.

This can be addressed with FULLSCAN stats, although it feels a bit overkill to me. You could gather stats in full on all statistics with a relevant data type (hopefully no need to worry about ascending key for VARCHAR) a few different times and see how the maximum high key changes.

For both of the above issues, I can't think of a way to programmatically find them just by looking at sampled statistics alone. That's why I said that you weren't gathering the right data to answer your question.

If you care for my opinion, the way to truly minimize statistics maintenance is to analyze the workload for queries that aren't performing well enough, do a careful root cause analysis to figure out when statistics issues are contributing, identify the precise type of statistics issue, and finally adjust the statistics maintenance job accordingly.

Context

StackExchange Database Administrators Q#226800, answer score: 6

Revisions (0)

No revisions yet.