patternsqlMinor
Reducing Temp Table Scans when joining two temp tables using OR conditions
Viewed 0 times
conditionstablestempreducingscanstwousingwhentablejoining
Problem
I'm working on a complicated query. I have up to this point been able to refactor to reduce execution time as well as number of scans and reads. At this point in the query, we have two temp tables which are structured exactly the same; the difference is that one table is a subset of the other (one was created using a larger date range). These tables are created by querying ~6 physical tables in a CTE, filtering down, etc. The part of the query I'm struggling with here is when we join the two tables on three fields, and then in the where clause, we further compare 5 columns in the tables using inequality operators and OR conditions. This query seems to be the costliest in the whole batch by ~200,000 logical reads and 30,000+ table scans. See the paste of the plan link below for the execution plan for this exact part of the query as well as the DML statement.
https://www.brentozar.com/pastetheplan/?id=H16jNYvXK
As you can see, we're doing table scans on the temp tables and then doing a merge join. The plan looks OK enough except that the merge join's row estimate is WAY too high [est: 38335 vs actual: 209].
I have indeed attempted to create indexes for the temp tables, partly out of desperation. It didn't seem to help in this case. The indexes I have tested were nonclustered indexes using the three fields in the join condition. This just changed the execution plan to use RID lookups in the heaps and did nothing to change the estimate or reduce the number of scans/reads. I have also tried a nonclustered index on the fields used in the WHERE clause, but due to a couple of the fields being varchar(max) fields (poor schema design choice that is from before my time and something I've been told to just deal with), I can't use these in an index. I have tried casting them down but some index inserts are failing because they are too long. Not only that, but my understanding is that creating indexes on temp tables are in many cases not really super useful (https://www.brento
https://www.brentozar.com/pastetheplan/?id=H16jNYvXK
As you can see, we're doing table scans on the temp tables and then doing a merge join. The plan looks OK enough except that the merge join's row estimate is WAY too high [est: 38335 vs actual: 209].
I have indeed attempted to create indexes for the temp tables, partly out of desperation. It didn't seem to help in this case. The indexes I have tested were nonclustered indexes using the three fields in the join condition. This just changed the execution plan to use RID lookups in the heaps and did nothing to change the estimate or reduce the number of scans/reads. I have also tried a nonclustered index on the fields used in the WHERE clause, but due to a couple of the fields being varchar(max) fields (poor schema design choice that is from before my time and something I've been told to just deal with), I can't use these in an index. I have tried casting them down but some index inserts are failing because they are too long. Not only that, but my understanding is that creating indexes on temp tables are in many cases not really super useful (https://www.brento
Solution
Testing
This query runs in ~1.2 seconds, so I'm not sure how much improvement you're expecting. Looking at things like scans and costs can often be misleading and fruitless when it comes to improving the overall timing of the query.
I'm cutting out some of the columns in your query just to highlight the changes more easily.
At the top, it may be worth going back to the
This may be a limiting factor overall.
Since the query time jumps up from ~443ms in the two child branches to ~1.2ms at the merge join, we can infer that ~743ms is spent at the merge join. Anton is correct that it's likely from it being a many to many merge join.
To avoid that, we can add a hash join hint, which doesn't have a many-to-many type.
It may also be instructive to try a
To your point about indexing temp tables, there are a lot of factors to consider that are not covered in Brent's post.
This query runs in ~1.2 seconds, so I'm not sure how much improvement you're expecting. Looking at things like scans and costs can often be misleading and fruitless when it comes to improving the overall timing of the query.
I'm cutting out some of the columns in your query just to highlight the changes more easily.
At the top, it may be worth going back to the
SELECT INTO #... method, or adding a tablock hint to the select to allow for a parallel plan. You may not get one, and it may not make a huge difference since your query plan shows EstimatedAvailableDegreeOfParallelism="2".This may be a limiting factor overall.
INSERT INTO
#temp WITH(TABLOCK)
select
IPS.ACCT as ACCT
/*SNIP*/
from
#SmallerRange IPS
join #LargerRange V on V.VID = IPS.VID and V.IorO = IPS.IorO and V.OtherLocationID = IPS.OtherLocationID
WHERE
(
IPS.CDate <> V.CDate or
IPS.ACCT != V.ACCT or
IPS.Unit != V.Unit or
IPS.ARRVLTIME <> V.ARRVLTIME or
IPS.LocationID != V.LocationID
)
OPTION(HASH JOIN);Since the query time jumps up from ~443ms in the two child branches to ~1.2ms at the merge join, we can infer that ~743ms is spent at the merge join. Anton is correct that it's likely from it being a many to many merge join.
To avoid that, we can add a hash join hint, which doesn't have a many-to-many type.
It may also be instructive to try a
OPTION(FORCE ORDER); hint as well.To your point about indexing temp tables, there are a lot of factors to consider that are not covered in Brent's post.
- Indexing Temp Tables
- When Should You Index Temp Tables?
Code Snippets
INSERT INTO
#temp WITH(TABLOCK)
select
IPS.ACCT as ACCT
/*SNIP*/
from
#SmallerRange IPS
join #LargerRange V on V.VID = IPS.VID and V.IorO = IPS.IorO and V.OtherLocationID = IPS.OtherLocationID
WHERE
(
IPS.CDate <> V.CDate or
IPS.ACCT != V.ACCT or
IPS.Unit != V.Unit or
IPS.ARRVLTIME <> V.ARRVLTIME or
IPS.LocationID != V.LocationID
)
OPTION(HASH JOIN);Context
StackExchange Database Administrators Q#299923, answer score: 3
Revisions (0)
No revisions yet.