patternsqlMajor
NEWID() In Joined Virtual Table Causes Unintended Cross Apply Behavior
Viewed 0 times
joinedcrossunintendedapplybehaviornewidvirtualtablecauses
Problem
My actual work query was an inner join, but this simple example with cross join seems to nearly always reproduce the problem.
With my inner join I had many rows for which I added to each a GUID using the NEWID() function, and for about 9 out of 10 such rows the multiplication with the 2-row virtual table produced the expected results, just 2 copies of the same GUID, while 1 out of 10 would produce different results. This was unexpected to say the least and gave me a really hard time trying to find this bug in my test data generation script.
If you take a look at the following queries using as well non-deterministic getdate and sysdatetime functions, you won't see this, I don't anyway-I always see the same datetime value in both final result rows.
I'm currently using SQL Server 2008 and my work around for now is to load my rows with GUIDs into a table variable before finishing out my random data generation script. Once I have them as values in a table as opposed to virtual table, the problem goes away.
I have a workaround, but I'm looking for the ways to workaround without actual tables or table variables.
While writing this I tried without success these possibilities:
1) placing the newid() into a nested virtual table:
2) wrapping the newid() within a cast expression such as:
3) reversing the order of appearance of the virtual tables within the join expression
```
SELECT *
FROM (
SELECT *
FROM (
SELECT 1 UNION ALL
SELECT 2
) AA ( A )
CROSS JOIN (
SELECT NEWID() TEST_ID
) BB ( B )With my inner join I had many rows for which I added to each a GUID using the NEWID() function, and for about 9 out of 10 such rows the multiplication with the 2-row virtual table produced the expected results, just 2 copies of the same GUID, while 1 out of 10 would produce different results. This was unexpected to say the least and gave me a really hard time trying to find this bug in my test data generation script.
If you take a look at the following queries using as well non-deterministic getdate and sysdatetime functions, you won't see this, I don't anyway-I always see the same datetime value in both final result rows.
SELECT *
FROM (
SELECT 1 UNION ALL
SELECT 2
) AA ( A )
CROSS JOIN (
SELECT GETDATE() TEST_ID
) BB ( B )
SELECT *
FROM (
SELECT 1 UNION ALL
SELECT 2
) AA ( A )
CROSS JOIN (
SELECT SYSDATETIME() TEST_ID
) BB ( B )I'm currently using SQL Server 2008 and my work around for now is to load my rows with GUIDs into a table variable before finishing out my random data generation script. Once I have them as values in a table as opposed to virtual table, the problem goes away.
I have a workaround, but I'm looking for the ways to workaround without actual tables or table variables.
While writing this I tried without success these possibilities:
1) placing the newid() into a nested virtual table:
SELECT *
FROM (
SELECT 1 UNION ALL
SELECT 2
) AA ( A )
CROSS JOIN (
SELECT TEST_ID
FROM (
SELECT NEWID() TEST_ID
) TT
) BB ( B )2) wrapping the newid() within a cast expression such as:
SELECT CAST(NEWID() AS VARCHAR(100)) TEST_ID3) reversing the order of appearance of the virtual tables within the join expression
```
SELECT *
FROM (
Solution
This behaviour is by design, as explained in detail on this Connect bug report. The most pertinent Microsoft reply is reproduced below for convenience (and in case the link dies at some point):
Posted by Microsoft on 7/7/2008 at 9:27 AM
Closing the loop . . . I've discussed this question with the Dev team.
And eventually we have decided not to change current behavior, for the
following reasons:
-
The optimizer does not guarantee timing or number of executions of
scalar functions. This is a long-estabilished tenet. It's the
fundamental 'leeway' tha allows the optimizer enough freedom to gain
significant improvements in query-plan execution.
-
This "once-per-row behavior" is not a new issue, although it's not
widely discussed. We started to tweak its behavior back in the Yukon
release. But it's quite hard to pin down precisely, in all cases,
exactly what it means! For example, does it a apply to interim rows
calculated 'on the way' to the final result? - in which case it
clearly depends on the plan chosen. Or does it apply only to the rows
that will eventually appear in the completed result? - there's a nasty
recursion going on here, as I'm sure you'll agree!
-
As I mentioned earlier, we default to "optimize performance" -
which is good for 99% of cases. The 1% of cases where it might change
results are fairly easy to spot - side-effecting 'functions' such as
NEWID - and easy to 'fix' (trading perf, as a consequence). This
default to "optimize performance" again, is long-established, and
accepted. (Yes, it's not the stance chosen by compilers for
conventional programming languages, but so be it).
So, our recommendations are:
semantics.
OPTION to force a particular behavior (trading perf)
Hope this explanation helps clarify our reasons for closing this bug
as "won't fix".
The
None of the 'workarounds' in the question are safe; there is no guarantee the behaviour will not change the next time the plan is compiled, when you next apply a service pack or cumulative update...or for other reasons.
The only safe solution is to use a temporary object of some kind - a variable, table, or multi-statement function for example. Using a workaround that appears to work today based on observation is a great way to experience unexpected behaviours in future, typically in the form of a paging alert at 3am on Sunday morning.
Posted by Microsoft on 7/7/2008 at 9:27 AM
Closing the loop . . . I've discussed this question with the Dev team.
And eventually we have decided not to change current behavior, for the
following reasons:
-
The optimizer does not guarantee timing or number of executions of
scalar functions. This is a long-estabilished tenet. It's the
fundamental 'leeway' tha allows the optimizer enough freedom to gain
significant improvements in query-plan execution.
-
This "once-per-row behavior" is not a new issue, although it's not
widely discussed. We started to tweak its behavior back in the Yukon
release. But it's quite hard to pin down precisely, in all cases,
exactly what it means! For example, does it a apply to interim rows
calculated 'on the way' to the final result? - in which case it
clearly depends on the plan chosen. Or does it apply only to the rows
that will eventually appear in the completed result? - there's a nasty
recursion going on here, as I'm sure you'll agree!
-
As I mentioned earlier, we default to "optimize performance" -
which is good for 99% of cases. The 1% of cases where it might change
results are fairly easy to spot - side-effecting 'functions' such as
NEWID - and easy to 'fix' (trading perf, as a consequence). This
default to "optimize performance" again, is long-established, and
accepted. (Yes, it's not the stance chosen by compilers for
conventional programming languages, but so be it).
So, our recommendations are:
- Avoid reliance on non-guaranteed timing and number-of-executions
semantics.
- Avoid using NEWID() deep in table expressions.
- Use
OPTION to force a particular behavior (trading perf)
Hope this explanation helps clarify our reasons for closing this bug
as "won't fix".
The
GETDATE and SYSDATETIME functions are indeed non-deterministic, but they are treated as runtime constants for a particular query. Broadly, this means the function's value is cached when query execution starts, and the result re-used for all references within the query.None of the 'workarounds' in the question are safe; there is no guarantee the behaviour will not change the next time the plan is compiled, when you next apply a service pack or cumulative update...or for other reasons.
The only safe solution is to use a temporary object of some kind - a variable, table, or multi-statement function for example. Using a workaround that appears to work today based on observation is a great way to experience unexpected behaviours in future, typically in the form of a paging alert at 3am on Sunday morning.
Context
StackExchange Database Administrators Q#30345, answer score: 24
Revisions (0)
No revisions yet.