patternsqlMinor
Why would writing the CTE to an actual table provide a performance improvement?
Viewed 0 times
whytheactualprovidewritingctewouldperformanceimprovementtable
Problem
I'm a huge fan of CTE's. They are the best. You know it, I know it, we all know it.
Today, I got burned with a query that is about 1000x faster if the CTE is written to a real table.
The query itself is in the link, but I suspect it isn't important. To simplify, I needed a list of all the claims in an insurance policy where one claim met a certain condition (cat = 1).
A simplified version looks like this:
which was super duper slow. If I rewrite it to look like this:
it's way, way faster. This is odd to me.
Question Why would writing the CTE to an actual table provide a performance improvement?
Slow plan:
https://www.brentozar.com/pastetheplan/?id=HkyrpAtH4
Fast Plan:
https://www.brentozar.com/pastetheplan/?id=ByGkAAKrN
Today, I got burned with a query that is about 1000x faster if the CTE is written to a real table.
The query itself is in the link, but I suspect it isn't important. To simplify, I needed a list of all the claims in an insurance policy where one claim met a certain condition (cat = 1).
A simplified version looks like this:
with cat_claims as (select distinct pol_nbr from claims where cat = 1)
select * from claims c
inner join policies p on p.polnbr = c.polnbr
inner join cat_claims cat on c.pol_nbr= cat.pol_nbrwhich was super duper slow. If I rewrite it to look like this:
with cat_claims as (select distinct claim_nbr from claims where cat = 1)
select * into cat_claims
from cat_claims
select * from claims c
inner join policies p on p.polnbr = c.polnbr
inner join cat_claims cat on c.pol_nbr = cat.pol_nbr
drop table cat_claimsit's way, way faster. This is odd to me.
Question Why would writing the CTE to an actual table provide a performance improvement?
Slow plan:
https://www.brentozar.com/pastetheplan/?id=HkyrpAtH4
Fast Plan:
https://www.brentozar.com/pastetheplan/?id=ByGkAAKrN
Solution
Even though they are estimated plans and not the actual ones, the difference to me looks like it is coming from joining the tables:
and the
Joining these tables spawn huge table spools (lazy spool), that could be even bigger on the actual plan.
For every row in the top part of the nested loops operator, the bottom result set is scanned resulting in
3361 rows * 3359 in the first
and
1541 Rows * 3359 in the second
These table spools are not present in either of the two plans in the temp table queries.
What I am getting at is that the query part that inserts into the table is alike to the corresponding query part of the
An example of a difference is that the full table scan on
CTE
Temp table
And you could argue that the second part of the query that uses the temp table might access the
However, the second table spool in the
Conclusion
I do believe that the difference in execution time comes from the query using the temp table's result in such a way that costly operators are not used. More specifically the Table spools. I would need the actual plan to be 100% sure.
As a sidenote, rewriting CTE's into temp tables or just splitting up queries in general can in many cases give better execution times.
It is easier to guess how many apples are in a bag with apples and pears than guess how many apples are in a bag with apples, pears, mango's, oranges, peaches, .... (Yes the fruit = tables).
I would also use actual #temp tables to store your intermediary results in.
Comparison CTE & Temp table insert query parts
Plan with CTE part 1
Plan with CTE part 2
Plan with temp table part 1
Difference on RATING_CONTRIB_LOSS & Hash match join <> Nested Loops Join. Rest is very much alike, Same operations (2x) on Rating table
RATING_CONTRIB_LOSS,RATING_CONTRIB_USR,cmbgrp,rating_revision_set, and the
Rating table from the ratings CTE. Joining these tables spawn huge table spools (lazy spool), that could be even bigger on the actual plan.
For every row in the top part of the nested loops operator, the bottom result set is scanned resulting in
3361 rows * 3359 in the first
table spool --> Nested loops --> Top(1)and
1541 Rows * 3359 in the second
table spool --> Nested loops --> Top(1)These table spools are not present in either of the two plans in the temp table queries.
What I am getting at is that the query part that inserts into the table is alike to the corresponding query part of the
CTE. There are some differences but I think that the major cause of the slowdown is because of the table spools.An example of a difference is that the full table scan on
RATING_CONTRIB_LOSS in the CTE plan is changed to a scan with a residual predicate in the temp table plan.CTE
Temp table
And you could argue that the second part of the query that uses the temp table might access the
RATING_CONTRIB_LOSS table again. And I would have to agree because it will be doing an expensive key lookup.However, the second table spool in the
CTE plan is also based on a nested loops join with theRATING_CONTRIB_LOSS table, which is not present in the temp table plan, and that is a big plus.Conclusion
I do believe that the difference in execution time comes from the query using the temp table's result in such a way that costly operators are not used. More specifically the Table spools. I would need the actual plan to be 100% sure.
As a sidenote, rewriting CTE's into temp tables or just splitting up queries in general can in many cases give better execution times.
It is easier to guess how many apples are in a bag with apples and pears than guess how many apples are in a bag with apples, pears, mango's, oranges, peaches, .... (Yes the fruit = tables).
I would also use actual #temp tables to store your intermediary results in.
Comparison CTE & Temp table insert query parts
Plan with CTE part 1
Plan with CTE part 2
Plan with temp table part 1
Difference on RATING_CONTRIB_LOSS & Hash match join <> Nested Loops Join. Rest is very much alike, Same operations (2x) on Rating table
Context
StackExchange Database Administrators Q#230173, answer score: 6
Revisions (0)
No revisions yet.