HiveBrain v1.2.0
Get Started
← Back to all entries
patternMinor

Never ending Query Store search

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
neversearchquerystoreending

Problem

I'll say from the beginning that my question/problem looks similar to this previous one, but since I'm not sure if the cause or the starting info is the same, I decided to post my question with some more details.

Issue at hand:

  • at a strange hour (near the end of business day) a production instance starts to behave erratically:



  • high CPU for the instance (from a baseline of ~30% it went to about double and was still growing)



  • increased number of transactions/sec (although the app load hasn't seen any change)



  • increased number of idle sessions



  • strange blocking events between sessions that never displayed this behavior (even read uncommitted sessions were causing blocking)



  • top waits for the interval were non page latch on 1st place, with locks taking 2nd place



Initial investigation:

  • using sp_whoIsActive we saw that a query executed by our monitoring tool decides to run extremely slow and grab lots of CPU, something that didn't happen before;



  • its isolation level was read uncommitted;



  • we looked at the plan we saw wacky numbers: StatementEstRows="3.86846e+010" with some 150 TB of estimated data to be returned



  • we suspected a query monitor feature of the monitoring tool was the cause, so we disabled the feature (we also opened a ticket with our provider to check if they're aware of any issue)



  • from that first event, it happened a few more times, with every time we kill the session, everything goes back to normal;



  • we realize the query is extremely similar to one of the queries used by MS in BOL for Query Store monitoring - Queries that recently regressed in performance (comparing different points in time)



  • we run the same query manually and see the same behavior (CPU used ever increasing, increasing latch waits, unexpected locks.. etc)



Guilty query:

```
Select qt.query_sql_text,
q.query_id,
qt.query_text_id,
rs1.runtime_stats_id AS runtime_stats_id_1,
interval_1 = DateAdd(minute, -(DateDiff(minute, getdate(), getutcdat

Solution

As I said in the answer, the empirical test showed that there were indexes on sys.plan_persisted* system objects without any (none) statistics created over them.
I suspect that is because the database is migrated from a SQL 2005 instance, and kept for a while at compatibility level 100, thus the new CE didn't get to be used.

Row counts check:

Select count(1) from NoNameDB.sys.plan_persist_runtime_stats with (nolock) --60362   
Select count(1) from NoNameDB.sys.plan_persist_plan with (nolock) --1853    
Select count(1) from NoNameDB.sys.plan_persist_runtime_stats_interval with (nolock) --671    
Select count(1) from NoNameDB.sys.plan_persist_query with (nolock) --1091    
Select count(1) from NoNameDB.sys.plan_persist_query_text with (nolock) --911


This showed that the initial estimates were wrong. Done with a DAC connection, otherwise the tables are not available to query.

Stats check:

DBCC SHOW_STATISTICS ('sys.plan_persist_runtime_stats_interval', plan_persist_runtime_stats_interval_cidx);    
DBCC SHOW_STATISTICS ('sys.plan_persist_runtime_stats', plan_persist_runtime_stats_idx1);    
DBCC SHOW_STATISTICS ('sys.plan_persist_runtime_stats', plan_persist_runtime_stats_cidx);    
DBCC SHOW_STATISTICS ('sys.plan_persist_plan', plan_persist_plan_cidx);    
DBCC SHOW_STATISTICS ('sys.plan_persist_plan', plan_persist_plan_idx1);    
DBCC SHOW_STATISTICS ('sys.plan_persist_query', plan_persist_query_cidx)    
DBCC SHOW_STATISTICS ('sys.plan_persist_query_text', plan_persist_query_text_cidx);


This showed that some indexes had empty stats (missing, none, zero).

Initial fix:

UPDATE STATISTICS sys.plan_persist_runtime_stats WITH fullscan;
UPDATE STATISTICS sys.plan_persist_plan WITH fullscan;
UPDATE STATISTICS sys.plan_persist_runtime_stats_interval WITH fullscan;
UPDATE STATISTICS sys.plan_persist_query WITH fullscan;
UPDATE STATISTICS sys.plan_persist_query_text WITH fullscan;


This kind of fixed the stats and made the query finish in 10-12 seconds.

Second fix:

(verified only on a testing environment) and most likely the proper one, as it showed the best stats for the query, was to change the database's compatibility level to 130. The end result was that the query ended in about 10-12 seconds with normal number stats (10k rows).

Intermediate fix:

DBCC TRACEON (2312) -- new CE


Some related help about stats on system hidden tables.

Code Snippets

Select count(1) from NoNameDB.sys.plan_persist_runtime_stats with (nolock) --60362   
Select count(1) from NoNameDB.sys.plan_persist_plan with (nolock) --1853    
Select count(1) from NoNameDB.sys.plan_persist_runtime_stats_interval with (nolock) --671    
Select count(1) from NoNameDB.sys.plan_persist_query with (nolock) --1091    
Select count(1) from NoNameDB.sys.plan_persist_query_text with (nolock) --911
DBCC SHOW_STATISTICS ('sys.plan_persist_runtime_stats_interval', plan_persist_runtime_stats_interval_cidx);    
DBCC SHOW_STATISTICS ('sys.plan_persist_runtime_stats', plan_persist_runtime_stats_idx1);    
DBCC SHOW_STATISTICS ('sys.plan_persist_runtime_stats', plan_persist_runtime_stats_cidx);    
DBCC SHOW_STATISTICS ('sys.plan_persist_plan', plan_persist_plan_cidx);    
DBCC SHOW_STATISTICS ('sys.plan_persist_plan', plan_persist_plan_idx1);    
DBCC SHOW_STATISTICS ('sys.plan_persist_query', plan_persist_query_cidx)    
DBCC SHOW_STATISTICS ('sys.plan_persist_query_text', plan_persist_query_text_cidx);
UPDATE STATISTICS sys.plan_persist_runtime_stats WITH fullscan;
UPDATE STATISTICS sys.plan_persist_plan WITH fullscan;
UPDATE STATISTICS sys.plan_persist_runtime_stats_interval WITH fullscan;
UPDATE STATISTICS sys.plan_persist_query WITH fullscan;
UPDATE STATISTICS sys.plan_persist_query_text WITH fullscan;
DBCC TRACEON (2312) -- new CE

Context

StackExchange Database Administrators Q#205389, answer score: 9

Revisions (0)

No revisions yet.