HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Bad execution plan after stats update due to temp table

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
afterdueupdatetabletempstatsbadplanexecution

Problem

A stored procedure query sometimes gets a bad plan after stats update on one of the tables, but can be recompiled to the good plan right after. Same compiled parameters.

Issue seems to come from a small temp table created in the SP and then joined to. The bad plan has a warning on the temp table that the join column has no stats. What gives?

SQL Server 2016 SP1 CU4, with 2014 compatibility level

Bad plan:

Good plan:

Stored procedure

USE AppDB
GO
SET QUOTED_IDENTIFIER ON
SET ANSI_NULLS ON
GO
CREATE PROCEDURE [MySchema].[MySP]
    @MyId VARCHAR(50),
    @Months INT
AS
BEGIN

    SET NOCOUNT ON

    SELECT * 
    INTO #MyTemp
    FROM AppDB.MySchema.View_Feeder vf WITH (NOLOCK)
    WHERE vf.MyId = @MyId AND vf.Status IS NOT NULL

    SELECT wd.Col1
         , vp.Col2
         , vp.Col3 
    FROM AppDB.MySchema.View_VP vp WITH (FORCESEEK)
    INNER JOIN #MyTemp wd ON wd.Col1 = vp.Col1
    WHERE vp.Col3 > DATEADD(MONTH, @Months * -1, GETDATE())

END


Inner view

USE AppDB
GO
SET QUOTED_IDENTIFIER ON
SET ANSI_NULLS ON
GO
CREATE VIEW [MySchema].[View_VP]
AS

    SELECT pp.Col1,
           pd.Col2 AS Col2, 
           MAX(pp.Col4) AS Col3
    FROM P_DB..LargeTable pp WITH (NOLOCK)
    INNER JOIN P_DB..SmallTable pd WITH (NOLOCK) ON pp.P_Id = pd.P_Id
    WHERE pp.[Status] IN (3, 4)
    GROUP BY pp.Col1, pd.Col2


Plans

Redacted good plan and bad plan.

Additional information

The FORCESEEK hint was added at the time to try to handle this very same issue and stabilize the plan. And anyway, with or without it, I'd really like to understand what's happening here.

I'm not able to reproduce the issue at will, so it's hard to say if replacing the SELECT INTO with an explicit table would make a difference. However, I believe it should behave the same way.

SELECT
    database_id, 
    is_auto_create_stats_on, 
    is_auto_update_stats_on, 
    is_auto_update_stats_async_on
FROM sys.databases
WHERE
    database_id IN (2, )


returns:

```
database_i

Solution

The bad plan has a warning on the temp table that the join column has no stats. What gives?

There may be a more esoteric reason for this, but it is more likely a simple statistics creation failure. This might, for example, occur when the task fails to get the memory resources it needs, or when statistics creation is being throttled (too many concurrent compilations). See the Microsoft White Paper Statistics Used by the Query Optimizer in Microsoft SQL Server 2008. You may be able to debug this further by looking at the auto stats Profiler or Extended Events and other events around the same time.

That said, a great deal more information and investigation would be needed to place the blame for the plan selection at the door of the missing temporary table statistics. Even without detailed statistics, the optimizer can still see the total cardinality of the temporary table, and this would seem to be an important factor here.


...but can be recompiled to the good plan right after. Same compiled parameters.

The @Months parameter might be the same, but the number of rows in the temporary table (from the unknown view View_Feeder) is different, and the plans provided do not show the value of @MyId.

Going from the information available: The 'good' plan (estimates only, no performance data presented) is based on a temporary table containing 4 rows. The 'bad plan' is based on a temporary table with 114 rows. Certainly a lack of density and histogram information may not be helpful, but it is easy to see how the optimizer might choose a different plan for 4 vs. 114 rows, albeit ones with unknown density and distribution.

If estimates on plan operators not dependent on the temporary table are wildly off, this is a strong signal that the current main-table statistics are not representative of the underlying data. The lack of information in the question makes this impossible to assess.

Nevertheless, it is possible to see that the optimizer is being asked to choose between sub-optimal alternatives here. Neither plan presented represents an 'obviously good' choice, since both involve lookups (lack of a 'covering' index) and late filtering (see next). Lookups in particular have a high cost associated with them, which depends sensitively on cardinality estimates.

Using a view restricts the optimizer and hinting choices:

  • The view contains a GROUP BY that prevents the predicate vp.Col3 > DATEADD(MONTH, @Months * -1, GETDATE()) from being pushed down, even though the transformation would be valid in this very specific case.



  • In-lining the view to the query would provide a natural way to filter the date/time column earlier (though the question does not state whether refactoring the query is an option).



  • It is not possible to hint an index on a view, and FORCESEEK simply asks the optimizer to find any index-seeking plan (not necessarily using the index you would prefer). Removing the view would likewise remove this restriction.



Allowing the predicate to push down should open up indexing opportunities on the large table as well. For example:

CREATE INDEX give_me_a_good_name
ON dbo.LargeTable (Col1, [Status], Col4) 
INCLUDE (P_Id);


...provides a good access path for the rewritten query:

DECLARE @Date datetime = DATEADD(MONTH, @Months * -1, GETDATE());

SELECT
    MT.Col1,
    ST.Col2,
    MAX(LT.Col4)
FROM #MyTemp AS MT
JOIN dbo.LargeTable AS LT
    ON LT.Col1 = MT.Col1
JOIN dbo.SmallTable AS ST
    ON ST.P_id = LT.P_Id
WHERE
    LT.[Status] IN (3, 4)
    AND LT.Col4 > @Date
GROUP BY
    MT.Col1,
    ST.Col2
OPTION (RECOMPILE);


Another consideration is the effect of temporary table and statistics caching as described in my articles Temporary Tables in Stored Procedures and Temporary Table Caching Explained. If a good plan depends on the current content of the temporary object, an explicit UPDATE STATISTICS #MyTemp; before the main query, and adding OPTION (RECOMPILE) to the main query may be a good solution.

Alternatively, if one particular plan shape is always optimal for this query, you have many options available, including a variety of hints, plan guides, and query store plan forcing. You might find that using a table variable instead of a temporary table is the better choice, since it favours the low cardinality case, and does not provide (or rely on) statistics.

In summary, there are several general improvements that should be undertaken before worrying about the reasons for (of effect of) occasional missing statistics on the temporary table:

  • Ensure statistics are representative and useful to the optimizer



  • Check actuals vs. estimates for a range of parameter values



  • Provide good data access path(s) for the query by improving existing indexes



  • Remove the view if possible; or consider a 'parameterized view' (inline table-valued function) with an explicit predicate for the date/time parameter.



  • Ensure that automatic statistics creation is not being throttled unnece

Code Snippets

CREATE INDEX give_me_a_good_name
ON dbo.LargeTable (Col1, [Status], Col4) 
INCLUDE (P_Id);
DECLARE @Date datetime = DATEADD(MONTH, @Months * -1, GETDATE());

SELECT
    MT.Col1,
    ST.Col2,
    MAX(LT.Col4)
FROM #MyTemp AS MT
JOIN dbo.LargeTable AS LT
    ON LT.Col1 = MT.Col1
JOIN dbo.SmallTable AS ST
    ON ST.P_id = LT.P_Id
WHERE
    LT.[Status] IN (3, 4)
    AND LT.Col4 > @Date
GROUP BY
    MT.Col1,
    ST.Col2
OPTION (RECOMPILE);
DROP VIEW IF EXISTS dbo.View_VP;
DROP TABLE IF EXISTS dbo.SmallTable, dbo.LargeTable, #MyTemp;
GO
CREATE TABLE LargeTable (P_Id integer NOT NULL, Status integer NOT NULL, Col1 integer NOT NULL, Col4 datetime NOT NULL);
CREATE TABLE SmallTable (P_id integer NOT NULL, Col2 integer NOT NULL)
CREATE TABLE #MyTemp (Col1 integer NOT NULL);
GO
CREATE VIEW dbo.View_VP 
AS
    SELECT
        pp.Col1,
        pd.Col2 AS Col2,
        MAX(pp.Col4) AS Col3
    FROM LargeTable pp
    JOIN SmallTable pd
        ON pd.P_id = pp.P_Id
    WHERE 
        pp.[Status] IN (3, 4)
    GROUP BY 
        pp.Col1, pd.Col2;
GO
CREATE UNIQUE CLUSTERED INDEX PK_SmallTable ON dbo.SmallTable (P_id)
CREATE CLUSTERED INDEX ix_P_id ON dbo.LargeTable (P_Id)
CREATE INDEX ix_Col1 ON dbo.LargeTable (Col1)
CREATE INDEX ix_Status ON dbo.LargeTable ([Status])
GO
UPDATE STATISTICS dbo.LargeTable WITH ROWCOUNT = 32268200, PAGECOUNT = 322682;
UPDATE STATISTICS dbo.SmallTable WITH ROWCOUNT = 6349, PAGECOUNT = 63;
UPDATE STATISTICS #MyTemp WITH ROWCOUNT = 4;
DECLARE @Months integer = 6;

SELECT wd.Col1
         , vp.Col2
         , vp.Col3 
    FROM dbo.View_VP vp WITH (FORCESEEK)
    INNER JOIN #MyTemp wd ON wd.Col1 = vp.Col1
    WHERE vp.Col3 > DATEADD(MONTH, @Months * -1, GETDATE())

Context

StackExchange Database Administrators Q#189020, answer score: 12

Revisions (0)

No revisions yet.