gotchasqlModerate
Unexpected results with random numbers and join types
Viewed 0 times
randomwithnumbersjoinunexpectedtypesandresults
Problem
I have a simple script that gets four random numbers (1 through 4) and then joins back to get the matching database_id number. When I run the script with a LEFT JOIN, I get four rows back every time (the expected result). However, when I run it with an INNER JOIN, I get a varying number of rows -- sometimes two, sometimes eight.
Logically, there shouldn't be any difference because I know rows with database_ids 1-4 exist in sys.databases. And because we're selecting from the random number table with four rows (as opposed to joining to it), there should never be any more than four rows returned.
This happens in both SQL Server 2012 and 2014. What is causing the INNER JOIN to return varying numbers of rows?
Logically, there shouldn't be any difference because I know rows with database_ids 1-4 exist in sys.databases. And because we're selecting from the random number table with four rows (as opposed to joining to it), there should never be any more than four rows returned.
This happens in both SQL Server 2012 and 2014. What is causing the INNER JOIN to return varying numbers of rows?
/* Works as expected -- always four rows */
SELECT rando.RandomNumber, d.database_id
FROM
(SELECT 1 + ABS(CHECKSUM(NEWID())) % (4) AS RandomNumber
FROM sys.databases WHERE database_id <= 4) AS rando
LEFT JOIN sys.databases d ON rando.RandomNumber = d.database_id;
/* Returns a varying number of rows */
SELECT rando.RandomNumber, d.database_id
FROM
(SELECT 1 + ABS(CHECKSUM(NEWID())) % (4) AS RandomNumber
FROM sys.databases WHERE database_id <= 4) AS rando
INNER JOIN sys.databases d ON rando.RandomNumber = d.database_id;
/* Also returns a varying number of rows */
WITH rando AS (
SELECT 1 + ABS(CHECKSUM(NEWID())) % (4) AS RandomNumber
FROM sys.databases WHERE database_id <= 4
)
SELECT r.RandomNumber, d.database_id
FROM rando AS r
INNER JOIN sys.databases d ON r.RandomNumber = d.database_id;Solution
This might give some insight until one of the smarter folks on the site chimes in.
I put the random results into a temporary table and I consistently get 4 result regardless of the join type.
If I compare query plans between your second query and the variation with a table variable, I can see there's a definite difference between the two. The red X is
If I eliminate the random bit of the query to a constant
It's computing the expression for the random number after the join. Whether that's expected, I still leave to the internal wizards on the site but at least that's why you're getting variable results in your join.
2014
For those playing along at home, the above query plans were generated from a 2008 R2 instance. The 2014 plans look different but the Compute Scalar operation remains after the join.
This is the query plan for a 2014 using the constant expression
This is the query plan for a 2014 instance using the newid expression.
This apparently is by design, Connect issue here. Thanks to @paulWhite for knowing that existed.
I put the random results into a temporary table and I consistently get 4 result regardless of the join type.
/* Works as expected -- always four rows */
DECLARE @Rando table
(
RandomNumber int
);
INSERT INTO
@Rando
(
RandomNumber
)
-- This generates 4 random numbers from 1 to 4, endpoints inclusive
SELECT
1 + ABS(CHECKSUM(NEWID())) % (4) AS RandomNumber
FROM
sys.databases
WHERE
database_id <= 4;
SELECT
*
FROM
@Rando AS R;
SELECT
rando.RandomNumber
, d.database_id
FROM
@Rando AS rando
LEFT JOIN
sys.databases d
ON rando.RandomNumber = d.database_id
ORDER BY 1,2;
/* Returns a varying number of rows */
SELECT rando.RandomNumber, d.database_id
FROM
@Rando AS rando
INNER JOIN
sys.databases d
ON rando.RandomNumber = d.database_id
ORDER BY 1,2;
/* Also returns a varying number of rows */
WITH rando AS
(
SELECT * FROM @Rando AS rando
)
SELECT r.RandomNumber, d.database_id
FROM
rando AS r
INNER JOIN
sys.databases d
ON r.RandomNumber = d.database_id
ORDER BY 1,2;If I compare query plans between your second query and the variation with a table variable, I can see there's a definite difference between the two. The red X is
No Join Predicate so that seems really odd to my caveman developer brainIf I eliminate the random bit of the query to a constant
1 % (4), my plan looks better but the Compute Scalar was eliminated so that led me to look closerIt's computing the expression for the random number after the join. Whether that's expected, I still leave to the internal wizards on the site but at least that's why you're getting variable results in your join.
2014
For those playing along at home, the above query plans were generated from a 2008 R2 instance. The 2014 plans look different but the Compute Scalar operation remains after the join.
This is the query plan for a 2014 using the constant expression
This is the query plan for a 2014 instance using the newid expression.
This apparently is by design, Connect issue here. Thanks to @paulWhite for knowing that existed.
Code Snippets
/* Works as expected -- always four rows */
DECLARE @Rando table
(
RandomNumber int
);
INSERT INTO
@Rando
(
RandomNumber
)
-- This generates 4 random numbers from 1 to 4, endpoints inclusive
SELECT
1 + ABS(CHECKSUM(NEWID())) % (4) AS RandomNumber
FROM
sys.databases
WHERE
database_id <= 4;
SELECT
*
FROM
@Rando AS R;
SELECT
rando.RandomNumber
, d.database_id
FROM
@Rando AS rando
LEFT JOIN
sys.databases d
ON rando.RandomNumber = d.database_id
ORDER BY 1,2;
/* Returns a varying number of rows */
SELECT rando.RandomNumber, d.database_id
FROM
@Rando AS rando
INNER JOIN
sys.databases d
ON rando.RandomNumber = d.database_id
ORDER BY 1,2;
/* Also returns a varying number of rows */
WITH rando AS
(
SELECT * FROM @Rando AS rando
)
SELECT r.RandomNumber, d.database_id
FROM
rando AS r
INNER JOIN
sys.databases d
ON r.RandomNumber = d.database_id
ORDER BY 1,2;Context
StackExchange Database Administrators Q#80374, answer score: 12
Revisions (0)
No revisions yet.