snippetsqlMinor
Does the SQL Server generate the contents for a derived table contents for every matching record in a join
Viewed 0 times
thesqlcontentseveryjoingeneraterecordfordoesserver
Problem
I am on SQL Server 2008 R2. I am executing the following SQL a
The result of the query is
I was expecting the GUID values in c3 to be the same. It appears that derived table t2 is being "reevaluated" for each row in the table t1 even though it is not a co-related sub-query. Is this expected behavior ?
use tempdb
go
create table t1 ( c1 int, c2 int, c3 uniqueidentifier null)
go
insert t1(c1,c2) values (1,2),(1,3)
go
update t1
set c2 = 5 + c2,
c3 = t2.c3
from t1 join ( select c1=1, c3=newid()) as t2 on t1.c1 = t2.c1
go
select * from t1
go
drop table t1
goThe result of the query is
c1 c2 c3
----------- ----------- ------------------------------------
1 7 D6BE2119-CECB-4413-94E4-6099E5CC9028
1 8 0BF8A3E2-4E66-4091-A1B0-4FB8E130B347I was expecting the GUID values in c3 to be the same. It appears that derived table t2 is being "reevaluated" for each row in the table t1 even though it is not a co-related sub-query. Is this expected behavior ?
Solution
You would expect the
Simply put, for each row
Here is a part of a statement from Microsoft regarding this:
Closing the loop . . . I've discussed this question with the Dev team.
And eventually we have decided not to change current behavior, for the
following reasons:
1) The optimizer does not guarantee timing or number of executions of
scalar functions. This is a long-established tenet. It's the
fundamental 'leeway' that allows the optimizer enough freedom to gain
significant improvements in query-plan execution....
As such the optimizer can choose the timing or number of executions, to get the best performance.
The source of the quote and some more information and examples and the response from microsoft
To explain it we best go through the query plan that is created for the
Right of the bat we don't see a join predicate, but simply a table scan.
A predicate is added to that table scan, to only get the rows
This is the '
After this, 2 rows are found that match this predicate, and are processed by the first
And
A workaround
The obvious fix is to use a variable to hold the
If you would have more than 1 source columns (T2) you would need to store the
Another interesting part:
So, we have a choice: if we want to guarantee a certain behavior in
the presence of non-deterministic (side-effecting) code – so that
results of JOINs, for example, follow the semantics of a nested-loop
execution – then we can use appropriate OPTIONs to force that behavior
– as UC points out. But the resulting code will run slow – that's the
cost of, in effect, hobbling the Query Optimizer.
Source
By using functions we could force the optimizer to first calculate the NEWID() on the source table, before the join. But I am not able to do that on the query at hand (probably not the query's fault).
NEWID() to be calculated once, for the derived table t2. This is not the case, as it is calculated for each row of the join resultset of T1 and T2.Simply put, for each row
where T1.C1 = T2.C1 or in other words where T1.C1=1.Here is a part of a statement from Microsoft regarding this:
Closing the loop . . . I've discussed this question with the Dev team.
And eventually we have decided not to change current behavior, for the
following reasons:
1) The optimizer does not guarantee timing or number of executions of
scalar functions. This is a long-established tenet. It's the
fundamental 'leeway' that allows the optimizer enough freedom to gain
significant improvements in query-plan execution....
As such the optimizer can choose the timing or number of executions, to get the best performance.
The source of the quote and some more information and examples and the response from microsoft
To explain it we best go through the query plan that is created for the
UPDATE statement.Right of the bat we don't see a join predicate, but simply a table scan.
A predicate is added to that table scan, to only get the rows
where C1 = 1This is the '
join' reduced to a predicate.After this, 2 rows are found that match this predicate, and are processed by the first
Compute Scalar operator.And
NEWID() is calculated for each column of the joined table + derived table.A workaround
The obvious fix is to use a variable to hold the
NEWID() value as to calculate it before the join & reuse it.create table t1 ( c1 int, c2 int, c3 uniqueidentifier null)
go
insert t1(c1,c2) values (1,2),(1,3)
go
DECLARE @test uniqueidentifier = newid()
update t1
set c2 = 5 + c2,
c3 = t2.c3
from t1 join
( select c1=1, c3=@test) as t2 on t1.c1 = t2.c1
go
select * from t1
go
drop table t1
goIf you would have more than 1 source columns (T2) you would need to store the
NEWID() values in a temporary table or table variableAnother interesting part:
So, we have a choice: if we want to guarantee a certain behavior in
the presence of non-deterministic (side-effecting) code – so that
results of JOINs, for example, follow the semantics of a nested-loop
execution – then we can use appropriate OPTIONs to force that behavior
– as UC points out. But the resulting code will run slow – that's the
cost of, in effect, hobbling the Query Optimizer.
Source
By using functions we could force the optimizer to first calculate the NEWID() on the source table, before the join. But I am not able to do that on the query at hand (probably not the query's fault).
Code Snippets
create table t1 ( c1 int, c2 int, c3 uniqueidentifier null)
go
insert t1(c1,c2) values (1,2),(1,3)
go
DECLARE @test uniqueidentifier = newid()
update t1
set c2 = 5 + c2,
c3 = t2.c3
from t1 join
( select c1=1, c3=@test) as t2 on t1.c1 = t2.c1
go
select * from t1
go
drop table t1
goContext
StackExchange Database Administrators Q#231385, answer score: 2
Revisions (0)
No revisions yet.