patternsqlMinor
Unusual column comparisons and query performance
Viewed 0 times
unusualcolumnqueryperformanceandcomparisons
Problem
We have some consultants working on expanding an inhouse data warehouse. I was doing a code review and ran across this pattern in all of the load procs:
The gist is, if we have a new business key, insert but if the business key exists and the hash of the attributes don't match our current row then update the old row and insert a new one (later in the code). It all works fine but I paused when I got to this code
It seems overly complicated compared to SOURCE.[HASH] <> TARGET.[Hash]. The EXCEPT will do an accurate NULL comparison but in our case hashes will never be NULL (or we have bigger problems). I want our code to be easy to read so that when someone has to maintain it, it doesn't confuse. I asked our consultants about it and they speculated that it might be faster because of set operations but I decided to write a simple test (test code below).
The first thing I noticed was the EXISTS/EXCEPT had a more complicated query plan but that's not always bad
I ran each select client statistics on and the <> join yielded total execution time of 12,000 vs 25,000 with the EXISTS/EXCEPT. I want to take this to our consultants with the request to refactor that statement but wanted to get feedback here on:
MERGE [EDHub].[Customer].[Class] AS TARGET
USING (
SELECT
FROM [dbo].[vw_CustomerClass]
WHERE JHAPostingDate = @PostingDate
) AS SOURCE
ON TARGET.BankId = SOURCE.BankId -- This join is on the business keys
AND TARGET.Code = SOURCE.Code
WHEN NOT MATCHED BY TARGET
THEN
WHEN MATCHED
AND TARGET.IsLatest = 1
AND EXISTS (
SELECT SOURCE.[HASH]
EXCEPT
SELECT TARGET.[Hash]
)
THEN
The gist is, if we have a new business key, insert but if the business key exists and the hash of the attributes don't match our current row then update the old row and insert a new one (later in the code). It all works fine but I paused when I got to this code
AND EXISTS (
SELECT SOURCE.[HASH]
EXCEPT
SELECT TARGET.[Hash]
)It seems overly complicated compared to SOURCE.[HASH] <> TARGET.[Hash]. The EXCEPT will do an accurate NULL comparison but in our case hashes will never be NULL (or we have bigger problems). I want our code to be easy to read so that when someone has to maintain it, it doesn't confuse. I asked our consultants about it and they speculated that it might be faster because of set operations but I decided to write a simple test (test code below).
The first thing I noticed was the EXISTS/EXCEPT had a more complicated query plan but that's not always bad
I ran each select client statistics on and the <> join yielded total execution time of 12,000 vs 25,000 with the EXISTS/EXCEPT. I want to take this to our consultants with the request to refactor that statement but wanted to get feedback here on:
- is this a good test? - am I missing anything?
- is
Solution
I don't like
For your test rig I tried four different ways of expressing the query and got the stated results.
SQL Server Execution Times: CPU time = 30968 ms, elapsed time =
8230 ms.
SQL Server Execution Times: CPU time = 31594 ms, elapsed time =
9230 ms.
SQL Server Execution Times: CPU time = 46531 ms, elapsed time =
13191 ms.
SQL Server Execution Times: CPU time = 23812 ms, elapsed time =
6760 ms.
So on that basis the last one would be a clear winner - along with a code comment with a link to Undocumented Query Plans: Equality Comparisons for anyone unfamiliar with the pattern.
But you should test whether this pattern is reproducible with your actual
ISNULL with sentinel values, it requires picking values that can't ever legitimately appear in the data now or forever after and personally I find expressions containing these more difficult to reason about.For your test rig I tried four different ways of expressing the query and got the stated results.
SELECT COUNT(1)
FROM dbo.l
CROSS JOIN dbo.r
WHERE r.hash <> l.hash
OR ( r.hash IS NULL
AND l.hash IS NOT NULL )
OR ( l.hash IS NULL
AND r.hash IS NOT NULL )SQL Server Execution Times: CPU time = 30968 ms, elapsed time =
8230 ms.
SELECT COUNT(1)
FROM dbo.l
CROSS JOIN dbo.r
WHERE ISNULL(r.hash, 0) <> ISNULL(l.hash, 0)SQL Server Execution Times: CPU time = 31594 ms, elapsed time =
9230 ms.
SELECT COUNT(1)
FROM dbo.l
CROSS JOIN dbo.r
WHERE EXISTS(SELECT r.hash
EXCEPT
SELECT l.HASH)SQL Server Execution Times: CPU time = 46531 ms, elapsed time =
13191 ms.
SELECT COUNT(1)
FROM dbo.l
CROSS JOIN dbo.r
WHERE NOT EXISTS(SELECT r.hash
INTERSECT
SELECT l.HASH)SQL Server Execution Times: CPU time = 23812 ms, elapsed time =
6760 ms.
So on that basis the last one would be a clear winner - along with a code comment with a link to Undocumented Query Plans: Equality Comparisons for anyone unfamiliar with the pattern.
But you should test whether this pattern is reproducible with your actual
MERGE query too.Code Snippets
SELECT COUNT(1)
FROM dbo.l
CROSS JOIN dbo.r
WHERE r.hash <> l.hash
OR ( r.hash IS NULL
AND l.hash IS NOT NULL )
OR ( l.hash IS NULL
AND r.hash IS NOT NULL )SELECT COUNT(1)
FROM dbo.l
CROSS JOIN dbo.r
WHERE ISNULL(r.hash, 0) <> ISNULL(l.hash, 0)SELECT COUNT(1)
FROM dbo.l
CROSS JOIN dbo.r
WHERE EXISTS(SELECT r.hash
EXCEPT
SELECT l.HASH)SELECT COUNT(1)
FROM dbo.l
CROSS JOIN dbo.r
WHERE NOT EXISTS(SELECT r.hash
INTERSECT
SELECT l.HASH)Context
StackExchange Database Administrators Q#204011, answer score: 9
Revisions (0)
No revisions yet.