patternsqlMinor

Why would writing the CTE to an actual table provide a performance improvement?

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

whytheactualprovidewritingctewouldperformanceimprovementtable

Problem

I'm a huge fan of CTE's. They are the best. You know it, I know it, we all know it.

Today, I got burned with a query that is about 1000x faster if the CTE is written to a real table.

The query itself is in the link, but I suspect it isn't important. To simplify, I needed a list of all the claims in an insurance policy where one claim met a certain condition (cat = 1).

A simplified version looks like this:

with cat_claims as (select distinct pol_nbr from claims where cat = 1)

    select * from claims c
    inner join policies p on p.polnbr = c.polnbr
    inner join cat_claims cat on c.pol_nbr= cat.pol_nbr

which was super duper slow. If I rewrite it to look like this:

with cat_claims as (select distinct claim_nbr from claims where cat = 1)

    select * into cat_claims
    from cat_claims

    select * from claims c
    inner join policies p on p.polnbr = c.polnbr
    inner join cat_claims cat on c.pol_nbr = cat.pol_nbr

    drop table cat_claims

it's way, way faster. This is odd to me.

Question Why would writing the CTE to an actual table provide a performance improvement?

Slow plan:

https://www.brentozar.com/pastetheplan/?id=HkyrpAtH4

Fast Plan:

https://www.brentozar.com/pastetheplan/?id=ByGkAAKrN

Solution

Even though they are estimated plans and not the actual ones, the difference to me looks like it is coming from joining the tables:
RATING_CONTRIB_LOSS,
RATING_CONTRIB_USR,
cmbgrp,
rating_revision_set,
and the Rating table from the ratings CTE.

Joining these tables spawn huge table spools (lazy spool), that could be even bigger on the actual plan.

For every row in the top part of the nested loops operator, the bottom result set is scanned resulting in

3361 rows * 3359 in the first table spool --> Nested loops --> Top(1)

and

1541 Rows * 3359 in the second table spool --> Nested loops --> Top(1)

These table spools are not present in either of the two plans in the temp table queries.

What I am getting at is that the query part that inserts into the table is alike to the corresponding query part of the CTE. There are some differences but I think that the major cause of the slowdown is because of the table spools.

An example of a difference is that the full table scan on RATING_CONTRIB_LOSS in the CTE plan is changed to a scan with a residual predicate in the temp table plan.

CTE

Temp table

And you could argue that the second part of the query that uses the temp table might access the RATING_CONTRIB_LOSS table again. And I would have to agree because it will be doing an expensive key lookup.

However, the second table spool in the CTE plan is also based on a nested loops join with theRATING_CONTRIB_LOSS table, which is not present in the temp table plan, and that is a big plus.

Conclusion

I do believe that the difference in execution time comes from the query using the temp table's result in such a way that costly operators are not used. More specifically the Table spools. I would need the actual plan to be 100% sure.

As a sidenote, rewriting CTE's into temp tables or just splitting up queries in general can in many cases give better execution times.

It is easier to guess how many apples are in a bag with apples and pears than guess how many apples are in a bag with apples, pears, mango's, oranges, peaches, .... (Yes the fruit = tables).

I would also use actual #temp tables to store your intermediary results in.

Comparison CTE & Temp table insert query parts

Plan with CTE part 1

Plan with CTE part 2

Plan with temp table part 1

Difference on RATING_CONTRIB_LOSS & Hash match join <> Nested Loops Join. Rest is very much alike, Same operations (2x) on Rating table

Context

StackExchange Database Administrators Q#230173, answer score: 6

Revisions (0)

No revisions yet.