patternsqlMinor
Optimization for branched WHERE
Viewed 0 times
optimizationbranchedwherefor
Problem
I have a query that executes against a table that can grow to millions of rows. The query comes out of a QA tool that we use that is outside the standard functionality of the DB (as far as what is indexed and how and why). The query is:
Basically, let's say the table has
The query should never pick the rows for id in (1, 2) because they are referred to by other rows. It should only grab rows where
The overall query is significantly larger,
I'm also reordering the rows by
I've looked at the execution plan. There are indices that would be added that would improve this query. However, performance of internal QA tools do not take precedence over perfo
SELECT id FROM thisTable t
WHERE col = 'val'
AND ((not exists (SELECT 1 FROM thisTable WHERE refid = t.id) and refbool = 0) or refbool = 1)
ORDER BY newid()Basically, let's say the table has
id, refid, refbool, and col columns. So you could have data as follows:id | refid | refbool | col
------------------------------------
1 | NULL | 0 | val
2 | NULL | 0 | val
3 | NULL | 0 | val
4 | 2 | 1 | val
5 | NULL | 0 | val
6 | 1 | 1 | valThe query should never pick the rows for id in (1, 2) because they are referred to by other rows. It should only grab rows where
refbool = 1, OR refbool = 0 AND that row's id is not any other row's refid. This statement is horribly non-performant, but I'm not sure what a better query would look like for this. Assume that no indexes, views, stored procedures, or other underlying machinations can be added - it must be a query.The overall query is significantly larger,
JOINS to two additional tables and gathers quite a bit of data. However, I've narrowed it does to this particular bit as commenting out this line takes the query execution time from 16s to <1s.I'm also reordering the rows by
newid() as I need to randomly select a sample item. Removing the ORDER BY also make the query significantly faster even leaving the third row in. It appears that the two operations combined causes the slowness. I've tried designing a CTE, but have failed to increase performance doing so.I've looked at the execution plan. There are indices that would be added that would improve this query. However, performance of internal QA tools do not take precedence over perfo
Solution
An execution plan was not included, but the typical issue with queries like this (sorting aside) is the optimizer choosing a nested loops anti semi join without a good supporting index. It can also be a rogue top (1), or a poorly-performing transformation to a semi-join with nested start-up filters and an anti-semi join.
Regardless, there are two usual workarounds:
The following incorporates both:
db<>fiddle online demo
For more alternatives on random ordering see the existing Q & A:
Don't just try the top answer.
Regardless, there are two usual workarounds:
- Rewrite the
ORmanually as aUNION(or, if guaranteed disjoint, as aUNION ALL).
- Rewrite the
NOT EXISTSas a left join filtering the preserved side forNULL.
The following incorporates both:
DECLARE @thisTable table
(
id integer PRIMARY KEY,
refid integer NULL,
refbool bit NOT NULL,
col varchar(10) NOT NULL
);
INSERT @thisTable
(id, refid, refbool, col)
VALUES
(1, NULL, 0, 'val'),
(2, NULL, 0, 'val'),
(3, NULL, 0, 'val'),
(4, 2 , 1, 'val'),
(5, NULL, 0, 'val'),
(6, 1 , 1, 'val');SELECT
U.id
FROM
(
-- T.refbool = 1
SELECT T.id
FROM @thisTable AS T
WHERE
T.col = 'val'
AND T.refbool = 1
-- Or (disjoint)
UNION ALL
-- T.refbool = 0 and not exists
SELECT T.id
FROM @thisTable AS T
LEFT JOIN @thisTable AS T2
ON T2.refid = T.id
WHERE
T.col = 'val'
AND T.refbool = 0
AND T2.id IS NULL
) AS U
ORDER BY
CHECKSUM(NEWID());db<>fiddle online demo
For more alternatives on random ordering see the existing Q & A:
- What is the best way to get a random ordering?.
Don't just try the top answer.
Code Snippets
DECLARE @thisTable table
(
id integer PRIMARY KEY,
refid integer NULL,
refbool bit NOT NULL,
col varchar(10) NOT NULL
);
INSERT @thisTable
(id, refid, refbool, col)
VALUES
(1, NULL, 0, 'val'),
(2, NULL, 0, 'val'),
(3, NULL, 0, 'val'),
(4, 2 , 1, 'val'),
(5, NULL, 0, 'val'),
(6, 1 , 1, 'val');SELECT
U.id
FROM
(
-- T.refbool = 1
SELECT T.id
FROM @thisTable AS T
WHERE
T.col = 'val'
AND T.refbool = 1
-- Or (disjoint)
UNION ALL
-- T.refbool = 0 and not exists
SELECT T.id
FROM @thisTable AS T
LEFT JOIN @thisTable AS T2
ON T2.refid = T.id
WHERE
T.col = 'val'
AND T.refbool = 0
AND T2.id IS NULL
) AS U
ORDER BY
CHECKSUM(NEWID());Context
StackExchange Database Administrators Q#273355, answer score: 5
Revisions (0)
No revisions yet.