patternsqlCritical
What are different ways to replace ISNULL() in a WHERE clause that uses only literal values?
Viewed 0 times
whatarewayswherereplaceliteralusesdifferentthatonly
Problem
What this isn't about:
This is not a question about catch-all queries that accept user input or use variables.
This is strictly about queries where
Why don't you have a seat over there?
Our example query is against a local copy of the Stack Overflow database on SQL Server 2016, and looks for users with a
The query plan shows a Scan of a quite thoughtful nonclustered index.
The scan operator shows (thanks to additions to actual execution plan XML in more recent versions of SQL Server) that we read every stinkin' row.
Overall, we do 9157 reads and use about half a second of CPU time:
The question:
What are ways to rewrite this query to make it more efficient, and perhaps even SARGable?
Feel free to offer other suggestions. I don't think my answer is necessarily the answer, and there are enough smart people out there to come up with alternatives that may be better.
If you want to play along on your own computer, head over here to download the SO database.
Thanks!
This is not a question about catch-all queries that accept user input or use variables.
This is strictly about queries where
ISNULL() is used in the WHERE clause to replace NULL values with a canary value for comparison to a predicate, and different ways to rewrite those queries to be SARGable in SQL Server.Why don't you have a seat over there?
Our example query is against a local copy of the Stack Overflow database on SQL Server 2016, and looks for users with a
NULL age, or an age < 18.SELECT COUNT(*)
FROM dbo.Users AS u
WHERE ISNULL(u.Age, 17) < 18;The query plan shows a Scan of a quite thoughtful nonclustered index.
The scan operator shows (thanks to additions to actual execution plan XML in more recent versions of SQL Server) that we read every stinkin' row.
Overall, we do 9157 reads and use about half a second of CPU time:
Table 'Users'. Scan count 1, logical reads 9157, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 485 ms, elapsed time = 483 ms.The question:
What are ways to rewrite this query to make it more efficient, and perhaps even SARGable?
Feel free to offer other suggestions. I don't think my answer is necessarily the answer, and there are enough smart people out there to come up with alternatives that may be better.
If you want to play along on your own computer, head over here to download the SO database.
Thanks!
Solution
Answer section
There are various ways to rewrite this using different T-SQL constructs. We'll look at the pros and cons and do an overall comparison below.
First up: Using
Using
Also note that the Seek is executed twice here, which really should be more obvious from the graphical operator:
Second up: Using derived tables with
Our query can also be rewritten like this
This yields the same type of plan, with far less malarkey, and a more apparent degree of honesty about how many times the index was seeked (sought?) into.
It does the same amount of reads (8233) as the
However, you have to be really careful here, because if this plan attempts to go parallel, the two separate
This can be avoided by changing our query slightly.
Now both nodes performing a Seek are fully parallelized until we hit the concatenation operator.
For what it's worth, the fully parallel version has some good benefit. At the cost of about 100 more reads, and about 90ms of additional CPU time, the elapsed time shrinks to 93ms.
What about CROSS APPLY?
No answer is complete without the magic of
Unfortunately, we run into more problems with
This plan is horrible. This is the kind of plan you end up with when you show up last to St. Patrick's Day. Though nicely parallel, for some reason it's scanning the PK/CX. Ew. The plan has a cost of 2198 query bucks.
Which is a weird choice, because if we force it to use the nonclustered index, the cost drops rather significantly to 1798 query bucks.
Hey, seeks! Check you out over there. Also note that with the magic of
Cross apply does end up faring better without the
```
SELECT SUM(Records)
FROM dbo.Users AS u
CROSS APPLY
(
SELECT 1
FROM dbo.Users AS u2
WHERE u2.Id = u.Id
AND u2.Age < 18
There are various ways to rewrite this using different T-SQL constructs. We'll look at the pros and cons and do an overall comparison below.
First up: Using
ORSELECT COUNT(*)
FROM dbo.Users AS u
WHERE u.Age < 18
OR u.Age IS NULL;Using
OR gives us a more efficient Seek plan, which reads the exact number of rows we need, however it adds what the technical world calls a whole mess of malarkey to the query plan.Also note that the Seek is executed twice here, which really should be more obvious from the graphical operator:
Table 'Users'. Scan count 2, logical reads 8233, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 469 ms, elapsed time = 473 ms.Second up: Using derived tables with
UNION ALLOur query can also be rewritten like this
SELECT SUM(Records)
FROM
(
SELECT COUNT(Id)
FROM dbo.Users AS u
WHERE u.Age < 18
UNION ALL
SELECT COUNT(Id)
FROM dbo.Users AS u
WHERE u.Age IS NULL
) x (Records);This yields the same type of plan, with far less malarkey, and a more apparent degree of honesty about how many times the index was seeked (sought?) into.
It does the same amount of reads (8233) as the
OR query, but shaves about 100ms of CPU time off.CPU time = 313 ms, elapsed time = 315 ms.However, you have to be really careful here, because if this plan attempts to go parallel, the two separate
COUNT operations will be serialized, because they're each considered a global scalar aggregate. If we force a parallel plan using Trace Flag 8649, the problem becomes obvious.SELECT SUM(Records)
FROM
(
SELECT COUNT(Id)
FROM dbo.Users AS u
WHERE u.Age < 18
UNION ALL
SELECT COUNT(Id)
FROM dbo.Users AS u
WHERE u.Age IS NULL
) x (Records)
OPTION(QUERYTRACEON 8649);This can be avoided by changing our query slightly.
SELECT SUM(Records)
FROM
(
SELECT 1
FROM dbo.Users AS u
WHERE u.Age < 18
UNION ALL
SELECT 1
FROM dbo.Users AS u
WHERE u.Age IS NULL
) x (Records)
OPTION(QUERYTRACEON 8649);Now both nodes performing a Seek are fully parallelized until we hit the concatenation operator.
For what it's worth, the fully parallel version has some good benefit. At the cost of about 100 more reads, and about 90ms of additional CPU time, the elapsed time shrinks to 93ms.
Table 'Users'. Scan count 12, logical reads 8317, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 500 ms, elapsed time = 93 ms.What about CROSS APPLY?
No answer is complete without the magic of
CROSS APPLY!Unfortunately, we run into more problems with
COUNT.SELECT SUM(Records)
FROM dbo.Users AS u
CROSS APPLY
(
SELECT COUNT(Id)
FROM dbo.Users AS u2
WHERE u2.Id = u.Id
AND u2.Age < 18
UNION ALL
SELECT COUNT(Id)
FROM dbo.Users AS u2
WHERE u2.Id = u.Id
AND u2.Age IS NULL
) x (Records);This plan is horrible. This is the kind of plan you end up with when you show up last to St. Patrick's Day. Though nicely parallel, for some reason it's scanning the PK/CX. Ew. The plan has a cost of 2198 query bucks.
Table 'Users'. Scan count 7, logical reads 31676233, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 29532 ms, elapsed time = 5828 ms.Which is a weird choice, because if we force it to use the nonclustered index, the cost drops rather significantly to 1798 query bucks.
SELECT SUM(Records)
FROM dbo.Users AS u
CROSS APPLY
(
SELECT COUNT(Id)
FROM dbo.Users AS u2 WITH (INDEX(ix_Id_Age))
WHERE u2.Id = u.Id
AND u2.Age < 18
UNION ALL
SELECT COUNT(Id)
FROM dbo.Users AS u2 WITH (INDEX(ix_Id_Age))
WHERE u2.Id = u.Id
AND u2.Age IS NULL
) x (Records);Hey, seeks! Check you out over there. Also note that with the magic of
CROSS APPLY, we don't need to do anything goofy to have a mostly fully parallel plan.Table 'Users'. Scan count 5277838, logical reads 31685303, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 27625 ms, elapsed time = 4909 ms.Cross apply does end up faring better without the
COUNT stuff in there.```
SELECT SUM(Records)
FROM dbo.Users AS u
CROSS APPLY
(
SELECT 1
FROM dbo.Users AS u2
WHERE u2.Id = u.Id
AND u2.Age < 18
Code Snippets
SELECT COUNT(*)
FROM dbo.Users AS u
WHERE u.Age < 18
OR u.Age IS NULL;Table 'Users'. Scan count 2, logical reads 8233, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
SQL Server Execution Times:
CPU time = 469 ms, elapsed time = 473 ms.SELECT SUM(Records)
FROM
(
SELECT COUNT(Id)
FROM dbo.Users AS u
WHERE u.Age < 18
UNION ALL
SELECT COUNT(Id)
FROM dbo.Users AS u
WHERE u.Age IS NULL
) x (Records);CPU time = 313 ms, elapsed time = 315 ms.SELECT SUM(Records)
FROM
(
SELECT COUNT(Id)
FROM dbo.Users AS u
WHERE u.Age < 18
UNION ALL
SELECT COUNT(Id)
FROM dbo.Users AS u
WHERE u.Age IS NULL
) x (Records)
OPTION(QUERYTRACEON 8649);Context
StackExchange Database Administrators Q#168276, answer score: 65
Revisions (0)
No revisions yet.