patternMinor
Improve performance with the WHERE NOT IN sub-select clause
Viewed 0 times
thewithwheresubimproveperformanceselectnotclause
Problem
In the following query, I have to count transactions for each customer. However, I have to exclude from the result set entirely, customers that have a transaction older than one year.
Shouldn't the query optimizer be smart enough to only evaluate existence once for each customer?
There are no naturals in my query plan. This query is performing as if the database is running the existence clause for every transaction instead of running it for every customer. Performance is the same if I remove the
Is there a better way to do this so that I may get better performance out of the database? Hopefully a simple
Due to other
Shouldn't the query optimizer be smart enough to only evaluate existence once for each customer?
--Count transactions on customers that are less than 1 year old
SELECT t1.CUSTID, COUNT(*)
FROM CUST_TRX t1
WHERE NOT EXISTS (
SELECT FIRST 1 1
FROM CUST_TRX t2
WHERE
t2.CUSTID = t1.CUSTID AND
t2.DATED < CURRENT_DATE - 365
GROUP BY t2.CUSTID
)
GROUP BY t1.CUSTIDThere are no naturals in my query plan. This query is performing as if the database is running the existence clause for every transaction instead of running it for every customer. Performance is the same if I remove the
GROUP BY in the subquery.Is there a better way to do this so that I may get better performance out of the database? Hopefully a simple
SELECT query will work avoiding a CTE if possible (that would introduce other challenges).Due to other
GROUP BY criteria (not shown here) I'm not able to simply check MIN(DATED), I really need to perform another query.Solution
With queries like this is it often more efficient to perform a
Try something like:
(note: I'm not familiar with firebird, so the above syntax may need tweaks but should illustrate the point)
Without the
Depending on the DB engine's abilities, especially if the amount of data in the reference object (
Also if you do it this way around leave a comment in the code (and/or supporting documentation) to say that you are doing this as an equivalent to
LEFT OUTER JOIN instead of the NOT EXISTS style check, it often implies a full index scan (or table scan without the right indexes in place) but with many rows in the main table(s) this is less expensive than the large number of index seeks (one on the reference table for each row returned from the main table) that would otherwise result. Some query planners are quite bright about spotting this equivalence and using the alternate plan where it is the better choice, but it doesn't sound like this has happened in your case.Try something like:
SELECT t1.CUSTID, COUNT(*)
FROM CUST_TRX t1
LEFT OUTER JOIN
CUST_TRX t2
ON t2.CUSTID=t1.CUSTID
AND t2.DATED<CURRENT_DATE-365
WHERE t2.CUSTID IS NULL
GROUP BY t1.CUSTID(note: I'm not familiar with firebird, so the above syntax may need tweaks but should illustrate the point)
Without the
WHERE t2.CUSTID IS NULL every row from t1 with matches in t2 will be output once for every match found in t2 and those with no matches in t2 will be output once but with any columns selected from that object set to NULL. The WHERE clause then screens out the matches.Depending on the DB engine's abilities, especially if the amount of data in the reference object (
CUST_TRX with a filter applied here) is huge, this may be significantly less efficient than the WHERE NOT IN or WHERE NOT EXISTS options, so benchmark over realistic data sets first before using the method. It often works out much more efficient with MS SQL Server in cases where the query planner doesn't notice that the WHERE NOT IN arrangement can be performed this way more efficiently.Also if you do it this way around leave a comment in the code (and/or supporting documentation) to say that you are doing this as an equivalent to
WHERE NOT IN or WHERE NOT EXISTS which you expect to be more efficient. You'll remember it and an experienced SQL person will recognise the pattern, but other people looking at the code might not immediately understand the intent/reason and flip it back to using WHERE NOT EXISTS for clarity as that reads better as on English sentence.Code Snippets
SELECT t1.CUSTID, COUNT(*)
FROM CUST_TRX t1
LEFT OUTER JOIN
CUST_TRX t2
ON t2.CUSTID=t1.CUSTID
AND t2.DATED<CURRENT_DATE-365
WHERE t2.CUSTID IS NULL
GROUP BY t1.CUSTIDContext
StackExchange Database Administrators Q#48830, answer score: 5
Revisions (0)
No revisions yet.