HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Subselect of MAX(Version) takes many minutes, though there are only ~20K records

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
versionarerecordssubselect20kmaxmanyminutesthoughthere

Problem

I have a pretty complicated data model. I can't understand most SQL examples without an explanation of what's being modeled, so I'll try to explain.

Mainlines -> Releases -> Overlays -> Calibrations <- Parameters

Calibrations, then, is the child of the the Mainline-Release-Overlay chain, and Parameters. This is the set I want to return.

Now, the complexity is that -- to save space -- we are storing the base Mainline's calibrations, and then just the DIFFERENCES when changes happen at the Release and Overlay levels. This produces a "base" set of Calibrations of about 16K rows, and then a few hundred changes each Release, and perhaps just a few changes each Overlay. Deletions of parameters happen. To keep track of this, Calibrations has a Status field (tinyint) which is set to 1 for a deletion.

Overlays get versioned successively. So, to get a complete "calibration," we need to query the Calibrations table for the latest version of a Parameter's data, up to a particular Overlay version number. Parameter metadata may change, but the names stay the same, so these Calibrations may refer to differing versions of Parameters, albeit with the same name.

Up till now, the following has worked perfectly, and instantaneously (with the following caveats):

```
SELECT c.Parameter_ParameterID, p.Designation AS Name, c.Data, o.Version
FROM Calibrations c, Parameters p, Overlays o, Releases r
WHERE r.Mainline_MainlineID = 9
AND o.Release_ReleaseID = r.ReleaseID
AND c.Overlay_OverlayID = o.OverlayID
AND c.Parameter_ParameterID = p.ParameterID
AND o.Version =
(SELECT MAX(o1.Version)
FROM Parameters p1, Calibrations c1, Overlays o1, Releases r1
WHERE r1.Mainline_MainlineID = 9
AND o1.Release_ReleaseID = r1.ReleaseID
AND c1.Overlay_OverlayID = o1.OverlayID
AND c1.Parameter_ParameterID = p1.ParameterID
AND p1.Designation = p.Designation -- New condition
AND o1.Version <= 68)

Solution

A full analysis would require access to the execution plans, table and index definitions, and database statistics (or a copy of the database itself). That's possibly unrealistic, so here's some general observations, and a possible solution for you to try. (Strictly, this question is probably beyond this site's remit.)

General background

The SQL Server query optimizer is the component responsible for choosing an execution plan for the logical results specification represented by your query. For even modestly complex queries, there are an enormous number of possible physical plans. The optimizer uses heuristics and cost estimation to choose from the limited plan space it explores.

The quality of information provided to the optimizer (including database design, indexing, and the accuracy of the current statistics) all have a great impact on the likelihood of the plan chosen by the optimizer performing well in practice. Where the design is relational, well-indexed, and with representative statistics, good plan choices will be the norm. Otherwise, all bets are off.

The query in question is moderately complex, and while its intent will be reasonably clear to most humans, the optimizer sees it as a SQL query like any other. The correlated sub-query with multiple joins and a MAX aggregate means the potential plan search space will be large, and estimates may not be accurate (due to accumulated errors, if nothing else). The optimizer may well end up choosing a plan that would work well if its assumptions were borne out, but which may perform horribly in reality.

The query and a potential rewrite

In broad terms, it appears you are asking SQL Server for the highest version per designation, with a limit on the version number. An alternative way to express this query (that is likely to have more predictable performance) is to number the rows (partitioned and ordered appropriately) then return the single row per group that is ranked #1:

WITH MaxVersionPerDesignation AS
(
    -- Do the joins and number the rows
    -- per designation, in descending
    -- Version order
    SELECT 
        c.Parameter_ParameterID, 
        p.Designation AS Name, 
        c.Data, 
        o.[Version],
        rn = ROW_NUMBER() OVER (
            PARTITION BY p.Designation
            ORDER BY o.[Version] DESC)
    FROM dbo.Releases AS r
    JOIN dbo.Overlays AS o
        ON o.Release_ReleaseID = r.ReleaseID
    JOIN dbo.Calibrations AS c
        ON c.Overlay_OverlayID = o.OverlayID
    JOIN dbo.[Parameters] AS p
        ON p.ParameterID = c.Parameter_ParameterID
    WHERE 
        r.Mainline_MainlineID = 9
        AND c.[Status] <> 1
        AND o.[Version] <= 68
)
-- Return the row with the highest Version per designation
SELECT
    MVPD.Parameter_ParameterID,
    MVPD.Name,
    MVPD.Data,
    MVPD.[Version]
FROM MaxVersionPerDesignation AS MVPD
WHERE
    -- Row #1 per Designation
    MVPD.rn = 1
ORDER BY 
    MVPD.[Version] DESC, 
    MVPD.Designation;


Hopefully, the logic in the SQL above is reasonably easy to understand.

Analysis of the uploaded plan

The query optimizer produces an execution plan that attempts the execute the subquery once per outer row. This would be a good idea if the outer query did only produce one row, but sadly it qualifies 16,794 rows at runtime (execution plans captured using SQL Sentry Plan Explorer):



This mis-estimation is the root cause of your problem. Most probably, statistics on the base tables are not representative of the current data. You should refresh these statistics and have an ongoing plan to ensure statistics stay reasonably current. Another side-effect of the erroneous 1-row estimation is that the optimizer considers the whole query to be very low cost, almost trivial, and so does not spend long examining alternatives.

Anyway, the subquery (which is very similar to the outer query aside from the correlation and an extra predicate) is fully executed 16,794 times (with a different value for Designation and Version each time). This is an awful strategy of course (but it would have been just fine for 1 outer row).

The cumulative effect is to generate over 282 million rows (over all 16k iterations). This results in 282 million executions of the Key Lookup in the plan fragment shown above. This latter fact is likely the primary cause of the poor performance, though executing the correlated subquery 16,794 times would never be quick in any case (and the optimizer would not choose this strategy if it knew 16,794 rows would be driving it).

With optimal indexing (following a more detailed analysis than I can do right now), it would be possible to rewrite the query using CROSS APPLY and TOP 1 instead of MAX (see this related question) for possibly-optimal performance, but you may find the ROW_NUMBER alternative (again, see the related question) performs well enough, with good plan-choice stability.

Key Lookups:

You can eliminate the Key Lookups (in the long plan) by

Code Snippets

WITH MaxVersionPerDesignation AS
(
    -- Do the joins and number the rows
    -- per designation, in descending
    -- Version order
    SELECT 
        c.Parameter_ParameterID, 
        p.Designation AS Name, 
        c.Data, 
        o.[Version],
        rn = ROW_NUMBER() OVER (
            PARTITION BY p.Designation
            ORDER BY o.[Version] DESC)
    FROM dbo.Releases AS r
    JOIN dbo.Overlays AS o
        ON o.Release_ReleaseID = r.ReleaseID
    JOIN dbo.Calibrations AS c
        ON c.Overlay_OverlayID = o.OverlayID
    JOIN dbo.[Parameters] AS p
        ON p.ParameterID = c.Parameter_ParameterID
    WHERE 
        r.Mainline_MainlineID = 9
        AND c.[Status] <> 1
        AND o.[Version] <= 68
)
-- Return the row with the highest Version per designation
SELECT
    MVPD.Parameter_ParameterID,
    MVPD.Name,
    MVPD.Data,
    MVPD.[Version]
FROM MaxVersionPerDesignation AS MVPD
WHERE
    -- Row #1 per Designation
    MVPD.rn = 1
ORDER BY 
    MVPD.[Version] DESC, 
    MVPD.Designation;
CREATE INDEX [IX_Overlay_OverlayID]
ON dbo.Callibrations (Overlay_OverlayID)
INCLUDE (Parameter_ParameterID, Data, Status)
WITH (DROP_EXISTING = ON);

Context

StackExchange Database Administrators Q#94046, answer score: 18

Revisions (0)

No revisions yet.