HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMinor

How to rewrite a query with MAX in WHERE clause

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
withquerywhererewritemaxhowclause

Problem

I have a query with a max in the where clause, which is slow.

select count(*)
from   TableName tbl1
where  tbl1.id = (
                select max(tbl2.id) 
                from TableName tbl2
                where tbl2.companyId = tbl1.companyId)
and    tbl1.active = 2


Please note that both tables are the same.

I rewrote it as:

select count(*) 
from   TableName tbl1 
inner join (
                select max(id) as id, companyId 
                from TableName tbl2 
                where tbl2.active= 2
                group by companyId
        ) tbl2 
        on  tbl2.companyId = tbl1.companyId 
        and tbl2.id=tbl1.id


but number of records is different.

Can anyone say how can I rewrite it for better performance?

Solution

Summary

Add the index:

CREATE INDEX index_name 
ON TableName 
    (companyid DESC, id DESC) 
INCLUDE 
    (active);


Try this rewrite:

SELECT COUNT_BIG(*) 
FROM
(
    SELECT TOP (9223372036854775807) 
        TN1.active
    FROM TableName AS TN1
    WHERE
        TN1.id =
        (
            SELECT MAX(TN2.id) 
            FROM TableName AS TN2
            WHERE
                TN2.companyid = TN1.companyid
        )
) AS SQ1
WHERE
    SQ1.active = 2;


Detail

The original query specification appears to be:

How many rows are there in the table where active = 2; and the id column equals the highest id value across all rows within the same companyid (without the active = 2 restriction, for clarity).

There is a way to get these results with a single scan of a suitable index.

For example, given the table:

CREATE TABLE #TableName
(
    companyid integer NOT NULL,
    id integer NOT NULL,
    active integer NOT NULL
);


A few rows of sample data:

INSERT #TableName
    (companyid, id, active)
VALUES
    (1, 1, 2),
    (1, 2, 2),
    (1, 3, 0),
    (2, 2, 2),
    (2, 3, 2),
    (2, 3, 0);


With the index:

CREATE INDEX index_name 
ON #TableName 
    (companyid DESC, id DESC) 
INCLUDE 
    (active);


A query (based on the original) to find rows without the predicate (active = 2) is:

SELECT
    TN1.active
FROM #TableName AS TN1
WHERE
    TN1.id =
    (
        SELECT MAX(TN2.id) 
        FROM #TableName AS TN2
        WHERE
            TN2.companyid = TN1.companyid
    );


The execution plan for that query features a single scan of the index, despite the query referencing the same table twice:

The basic idea is to scan the index in key order, returning the row(s) with the highest id within each group of companyid values.

Unfortunately, we cannot simply add the predicate active = 2 to that query without breaking the optimizer transformation that gives us the single scan. The problem (if you're interested) is that the new predicate gets pushed down the plan tree, resulting in a shape that no longer matches the 'group by top' transform.

One workaround adds a meaningless TOP specification around the part of the query we want transformed, preventing the new predicate being pushed past it:

SELECT COUNT_BIG(*) 
FROM
(
    SELECT TOP (9223372036854775807) 
        TN1.active
    FROM #TableName AS TN1
    WHERE
        TN1.id =
        (
            SELECT MAX(TN2.id) 
            FROM #TableName AS TN2
            WHERE
                TN2.companyid = TN1.companyid
        )
) AS SQ1
WHERE
    SQ1.active = 2;


The execution plan is similar to before, with an extra Top (the meaningless, bigint.max one we added), a Filter for the active = 2 predicate, and a final Stream Aggregate to count the rows:

I cannot promise you will get this plan shape with the real data, or that it will be faster. It all rather depends on the number and distribution of the rows.

The new Top also has the effect of disfavouring a parallel plan (the group by top is parallel-compatible, but a global top is not; parallelism would have to stop and restart around it).

Nevertheless, until a better option comes along, or more detail is provided, it is something you could try.

Code Snippets

CREATE INDEX index_name 
ON TableName 
    (companyid DESC, id DESC) 
INCLUDE 
    (active);
SELECT COUNT_BIG(*) 
FROM
(
    SELECT TOP (9223372036854775807) 
        TN1.active
    FROM TableName AS TN1
    WHERE
        TN1.id =
        (
            SELECT MAX(TN2.id) 
            FROM TableName AS TN2
            WHERE
                TN2.companyid = TN1.companyid
        )
) AS SQ1
WHERE
    SQ1.active = 2;
CREATE TABLE #TableName
(
    companyid integer NOT NULL,
    id integer NOT NULL,
    active integer NOT NULL
);
INSERT #TableName
    (companyid, id, active)
VALUES
    (1, 1, 2),
    (1, 2, 2),
    (1, 3, 0),
    (2, 2, 2),
    (2, 3, 2),
    (2, 3, 0);
CREATE INDEX index_name 
ON #TableName 
    (companyid DESC, id DESC) 
INCLUDE 
    (active);

Context

StackExchange Database Administrators Q#136389, answer score: 7

Revisions (0)

No revisions yet.