patternsqlModerate
Running query against DMVs for Query Stats and Execution Plans joined to sys.databases
Viewed 0 times
databasesjoinedquerystatsagainstdmvsrunningplansforsys
Problem
I originally posted this as a different question in Twitter #sqlhelp, but wanted to post it here differently.
I'm trying to create a job that tracks worst performing statements for various statistics such as reads, writes, CPU, execution count, etc. using the DMVs dm_exec_query_stats, dm_exec_cached_plans, and dm_exec_sql_text, and originally sys.databases.
What happened was that this query would sometimes, and not all the time, take over 8 minutes to run on a busy server, but as soon as I removed the join to sys.databases, took only 9 seconds. Most of this duration was CPU time, with very little waits, no blocking, no scans, only 13406 logical reads, and 1562 lob reads.
So what I want to know, is why the huge performance hit from joining to sys.databases? And why is it inconsistent?
What happened was that when I ran the following test query, it would take over 8 minutes sometimes for no apparent reason, and other times complete in under 11 seconds.
```
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SET NOCOUNT ON;
DECLARE @snapshot_timeoffset AS datetimeoffset(3) = CAST(SYSDATETIMEOFFSET() AS datetimeoffset(3));
SELECT
@snapshot_timeoffset AS [snapshot_timeoffset]
,db.name AS [database_name]
,OBJECT_SCHEMA_NAME(st.objectid, st.dbid) [schema_name]
,OBJECT_NAME(st.objectid, st.dbid) [object_name]
,cp.objtype
,cp.usecounts
,cp.refcounts
-- find the offset of the actual statement being executed
,SUBSTRING(st.text,
CASE
WHEN qs.statement_start_offset = 0 OR qs.statement_start_offset IS NULL THEN 1
ELSE qs.statement_start_offset/2 + 1
END,
CASE
WHEN qs.statement_end_offset = 0 OR qs.statement_end_offset = -1 OR qs.statement_end_offset IS NULL THEN LEN(st.text)
ELSE qs.statement_end_offset/2
END -
CASE
WHEN qs.statement_start_offset = 0 OR qs.statement_start_offset IS NUL
I'm trying to create a job that tracks worst performing statements for various statistics such as reads, writes, CPU, execution count, etc. using the DMVs dm_exec_query_stats, dm_exec_cached_plans, and dm_exec_sql_text, and originally sys.databases.
What happened was that this query would sometimes, and not all the time, take over 8 minutes to run on a busy server, but as soon as I removed the join to sys.databases, took only 9 seconds. Most of this duration was CPU time, with very little waits, no blocking, no scans, only 13406 logical reads, and 1562 lob reads.
So what I want to know, is why the huge performance hit from joining to sys.databases? And why is it inconsistent?
What happened was that when I ran the following test query, it would take over 8 minutes sometimes for no apparent reason, and other times complete in under 11 seconds.
```
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SET NOCOUNT ON;
DECLARE @snapshot_timeoffset AS datetimeoffset(3) = CAST(SYSDATETIMEOFFSET() AS datetimeoffset(3));
SELECT
@snapshot_timeoffset AS [snapshot_timeoffset]
,db.name AS [database_name]
,OBJECT_SCHEMA_NAME(st.objectid, st.dbid) [schema_name]
,OBJECT_NAME(st.objectid, st.dbid) [object_name]
,cp.objtype
,cp.usecounts
,cp.refcounts
-- find the offset of the actual statement being executed
,SUBSTRING(st.text,
CASE
WHEN qs.statement_start_offset = 0 OR qs.statement_start_offset IS NULL THEN 1
ELSE qs.statement_start_offset/2 + 1
END,
CASE
WHEN qs.statement_end_offset = 0 OR qs.statement_end_offset = -1 OR qs.statement_end_offset IS NULL THEN LEN(st.text)
ELSE qs.statement_end_offset/2
END -
CASE
WHEN qs.statement_start_offset = 0 OR qs.statement_start_offset IS NUL
Solution
...why the huge performance hit from joining to sys.databases? And why is it inconsistent?
There's nothing special about joining to sys.databases. The optimizer happens to choose a plan that is inefficient for the first query. Specifically, in this area of the plan:
...the optimizer chooses a nested loops join to
..the same function now has Actual Number of Rows 327605402, which is a huge discrepancy.
This is a common mistake when reading execution plans in SSMS (SQL Sentry's Plan Explorer does not suffer from this problem). In SSMS, the estimated number of rows on the inner side of a nested loops join is shown per iteration, whereas the actual number of rows is shown over all iterations. This is the result of a "questionable design decision".
The root cause of the poor performance is therefore most likely simply that the complete set of all cached plans is scanned rather more times than the optimizer expected. Scanning the whole cache 6269 times will be slow (unless the cache is tiny at the time).
Cardinality estimation errors do not just occur with user tables, so there is nothing too surprising about this. Don't blame it on sys.databases specifically. It is quite common for "an extra join" or "slight change to an predicate" to have a dramatic effect on plan selection.
You have fewer options than usual to help the optimizer with queries against system tables, views, and functions. For the expert analyst, hints may be the appropriate solution. In other cases, breaking the query up into more than one step may be best. In your case, simply replacing the "extra join" with a metadata function seems to be adequate, on the basis on the information provided.
There's nothing special about joining to sys.databases. The optimizer happens to choose a plan that is inefficient for the first query. Specifically, in this area of the plan:
...the optimizer chooses a nested loops join to
SYSDMEXECCACHEDPLANS, presumably based on an expectation of a very small number of driving rows on the outer input to the join (from the merge join). The optimizer introduces a table spool to reduce the processing cost of any duplicate values on the outer input to the nested loops join, but this is unlikely to be terribly effective - a hash join would have been "better"...the same function now has Actual Number of Rows 327605402, which is a huge discrepancy.
This is a common mistake when reading execution plans in SSMS (SQL Sentry's Plan Explorer does not suffer from this problem). In SSMS, the estimated number of rows on the inner side of a nested loops join is shown per iteration, whereas the actual number of rows is shown over all iterations. This is the result of a "questionable design decision".
The root cause of the poor performance is therefore most likely simply that the complete set of all cached plans is scanned rather more times than the optimizer expected. Scanning the whole cache 6269 times will be slow (unless the cache is tiny at the time).
Cardinality estimation errors do not just occur with user tables, so there is nothing too surprising about this. Don't blame it on sys.databases specifically. It is quite common for "an extra join" or "slight change to an predicate" to have a dramatic effect on plan selection.
You have fewer options than usual to help the optimizer with queries against system tables, views, and functions. For the expert analyst, hints may be the appropriate solution. In other cases, breaking the query up into more than one step may be best. In your case, simply replacing the "extra join" with a metadata function seems to be adequate, on the basis on the information provided.
Context
StackExchange Database Administrators Q#138929, answer score: 10
Revisions (0)
No revisions yet.