HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

SQL server Database statistics degrading after execution

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
aftersqldegradingdatabasestatisticsserverexecution

Problem

I have been experiencing timeouts while querying my SQL Server DB in particular one table. I tried updating the statistics and it fixed the performance issues for a couple of days.

I started digging deeper and found out that there is a scheduled synchronization task that retrieves data from a web-service and updates the database. This task uses Entity framework for this purpose. Several colleagues tried to fix this but so far no luck.
I wanted to take a step back.

  • Throw EF out of the picture



  • this should not be a piece of C#/EF code to do this but is a perfect job for an ETL tool. A delta update should (according to my understanding of DB statistics do the job). So far, every time the scheduled task runs, everything is being updated.



Can someone provide me with more information or insights in this situation for clues so that I can try to fix the code before bringing in new tools and solutions.

Solution

Starting at the beginning:

SQL Server uses statistics objects to generate cardinality estimates for the result set steps during the optimization phase. Cardinality estimates are used to generate a query plan, which is the logical plan by which SQL Server will physically access and retrieve the data in order to satisfy the request. Generally speaking, more accurate statistics lead to a better plan because SQL Server chooses the logical and physical operators to use based on these cardinality estimates. Good estimates give SQL Server a better chance at choosing the best operation for the amount of data it expects to process.

A SQL Server Statistics object consists of a header, a density vector, and a histogram. The density vector and the histogram are used under different circumstances when estimating cardinality.

Modifications (via INSERT, UPDATE, and DELETE) necessarily change the data profile (INSERT and DELETE by modifying the number of rows in the table itself, and UPDATE by modifying the number of rows that satisfy a condition) so any of these "writes" can ultimately affect the optimization and planning phases of query execution. For a minor number of writes, this generally isn't an issue as a few rows here-and-there won't necessarily change the operators that SQL Server uses. A notable exception is if the written data are inserts that "add on" to the existing data set (e.g. an IDENTITY column, or inserting the next day's data). This is called an ascending key and can be problematic because the values exist outside of the boundaries of the histogram and (up until SQL Server 2014) SQL Server has no knowledge of the number of rows that would satisfy a request with the new key as a filter value. Sometimes you'll hear about statistics that have taken many modifications as "stale."

Because you've not shared the nature of your scheduled synchronization task, I can't tell you why updating statistics helps, in your case. It could be ascending key, in which case the update will include the new key values in the histogram. Or it could be that you've squeezed in enough data to affect the cardinality of the result set of an extant value, but your cached plan is not fit for this new amount of data. It also depends on the queries with which you're experiencing this performance degradation. Are they queries on the new data, old data, or all data? Additionally, I've seen some poor SQL generated by Entity Framework, which adds layers of complexity to the optimization and planning phase so that when the poor SQL is combined with "stale" statistics, you get a bad plan that degrades over time (if you're lucky). So, calls to scrap EF are not bad advice, but let's stay focused on the statistics portion.

Depending on your version of SQL Server, updating statistics might cause plan cache invalidation. If you're running 2005->2008R2, invalidation will only occur for relevant plans if the AUTO_UPDATE_STATISTICS database option is enabled on the database. Additionally, for versions >= 2012, you'll get an invalidated plan after a stats update if there were any modifications to the stats objects. What this means is that blindly running a stats update (especially in 2012 and newer) won't give you a better plan if you've not modified any data.

So how can you check to see if you've modified any data? Here's a query that I use (of particular interest to you would be the ModificationCounter column):

select 
    SchemaName = sch.name,
    TableName = t.name,
    StatName = s.name,
    IndexName = i.name,
    StatLeadingColumn = index_col(quotename(sch.name)+'.'+quotename(t.name),s.stats_id,1),
    StatColumns = sc.StatColumns,
    IsFiltered = s.has_filter,
    StatFilter = s.filter_definition,
    IsAutoCreated = s.auto_created,

    StatLastUpdated = sp.last_updated,
    ModificationCounter = sp.modification_counter,
    StatRows = sp.rows,
    StatRowsSampled = sp.rows_sampled,
    StatUnfilteredRows = sp.unfiltered_rows,
    StatSteps = sp.steps
from sys.stats s 
join sys.tables t 
    on s.object_id = t.object_id
join sys.schemas sch 
    on t.schema_id = sch.schema_id
cross apply sys.dm_db_stats_properties(s.object_id,s.stats_id) sp
cross apply (
                select
                stuff(
                        (
                            select ', '+c.name
                            from sys.stats_columns scol
                            join sys.columns c
                                on scol.object_id = c.object_id
                                and scol.column_id = c.column_id
                            where scol.object_id = s.object_id
                                and scol.stats_id = s.stats_id
                            for xml path(''), type
                        ).value('.','nvarchar(max)')
                        ,1,2,''
                    ) StatColumns
            ) sc
left join sys.indexes i 
    on s.object_id = i.object_id
    and s.name = i.name


In order to run t

Code Snippets

select 
    SchemaName = sch.name,
    TableName = t.name,
    StatName = s.name,
    IndexName = i.name,
    StatLeadingColumn = index_col(quotename(sch.name)+'.'+quotename(t.name),s.stats_id,1),
    StatColumns = sc.StatColumns,
    IsFiltered = s.has_filter,
    StatFilter = s.filter_definition,
    IsAutoCreated = s.auto_created,

    StatLastUpdated = sp.last_updated,
    ModificationCounter = sp.modification_counter,
    StatRows = sp.rows,
    StatRowsSampled = sp.rows_sampled,
    StatUnfilteredRows = sp.unfiltered_rows,
    StatSteps = sp.steps
from sys.stats s 
join sys.tables t 
    on s.object_id = t.object_id
join sys.schemas sch 
    on t.schema_id = sch.schema_id
cross apply sys.dm_db_stats_properties(s.object_id,s.stats_id) sp
cross apply (
                select
                stuff(
                        (
                            select ', '+c.name
                            from sys.stats_columns scol
                            join sys.columns c
                                on scol.object_id = c.object_id
                                and scol.column_id = c.column_id
                            where scol.object_id = s.object_id
                                and scol.stats_id = s.stats_id
                            for xml path(''), type
                        ).value('.','nvarchar(max)')
                        ,1,2,''
                    ) StatColumns
            ) sc
left join sys.indexes i 
    on s.object_id = i.object_id
    and s.name = i.name

Context

StackExchange Database Administrators Q#124247, answer score: 4

Revisions (0)

No revisions yet.