HiveBrain v1.2.0
Get Started
← Back to all entries
gotchasqlModerate

Unexpected results with random numbers and join types

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
randomwithnumbersjoinunexpectedtypesandresults

Problem

I have a simple script that gets four random numbers (1 through 4) and then joins back to get the matching database_id number. When I run the script with a LEFT JOIN, I get four rows back every time (the expected result). However, when I run it with an INNER JOIN, I get a varying number of rows -- sometimes two, sometimes eight.

Logically, there shouldn't be any difference because I know rows with database_ids 1-4 exist in sys.databases. And because we're selecting from the random number table with four rows (as opposed to joining to it), there should never be any more than four rows returned.

This happens in both SQL Server 2012 and 2014. What is causing the INNER JOIN to return varying numbers of rows?

/* Works as expected -- always four rows */

SELECT rando.RandomNumber, d.database_id
FROM 
  (SELECT 1 + ABS(CHECKSUM(NEWID())) % (4) AS RandomNumber 
   FROM sys.databases WHERE database_id <= 4) AS rando
LEFT JOIN sys.databases d ON rando.RandomNumber = d.database_id;

/* Returns a varying number of rows */

SELECT rando.RandomNumber, d.database_id
FROM 
  (SELECT 1 + ABS(CHECKSUM(NEWID())) % (4) AS RandomNumber 
   FROM sys.databases WHERE database_id <= 4) AS rando
INNER JOIN sys.databases d ON rando.RandomNumber = d.database_id;

/* Also returns a varying number of rows */

WITH rando AS (
  SELECT 1 + ABS(CHECKSUM(NEWID())) % (4) AS RandomNumber
  FROM sys.databases WHERE database_id <= 4
)

SELECT r.RandomNumber, d.database_id
FROM rando AS r
INNER JOIN sys.databases d ON r.RandomNumber = d.database_id;

Solution

This might give some insight until one of the smarter folks on the site chimes in.

I put the random results into a temporary table and I consistently get 4 result regardless of the join type.

/* Works as expected -- always four rows */

DECLARE @Rando table
(
    RandomNumber int
);

INSERT INTO
    @Rando
(
    RandomNumber
)
-- This generates 4 random numbers from 1 to 4, endpoints inclusive
SELECT
    1 + ABS(CHECKSUM(NEWID())) % (4) AS RandomNumber
FROM
    sys.databases
WHERE
    database_id <= 4;

SELECT
    *
FROM
    @Rando AS R;

SELECT
    rando.RandomNumber
,   d.database_id
FROM 
    @Rando AS rando
    LEFT JOIN 
        sys.databases d 
        ON rando.RandomNumber = d.database_id
ORDER BY 1,2;

/* Returns a varying number of rows */

SELECT rando.RandomNumber, d.database_id
FROM 
    @Rando AS rando
    INNER JOIN 
        sys.databases d 
        ON rando.RandomNumber = d.database_id
ORDER BY 1,2;

/* Also returns a varying number of rows */

WITH rando AS 
(
    SELECT * FROM @Rando AS rando
)
SELECT r.RandomNumber, d.database_id
FROM 
    rando AS r
    INNER JOIN 
        sys.databases d 
        ON r.RandomNumber = d.database_id
ORDER BY 1,2;


If I compare query plans between your second query and the variation with a table variable, I can see there's a definite difference between the two. The red X is No Join Predicate so that seems really odd to my caveman developer brain

If I eliminate the random bit of the query to a constant 1 % (4), my plan looks better but the Compute Scalar was eliminated so that led me to look closer

It's computing the expression for the random number after the join. Whether that's expected, I still leave to the internal wizards on the site but at least that's why you're getting variable results in your join.

2014

For those playing along at home, the above query plans were generated from a 2008 R2 instance. The 2014 plans look different but the Compute Scalar operation remains after the join.

This is the query plan for a 2014 using the constant expression

This is the query plan for a 2014 instance using the newid expression.

This apparently is by design, Connect issue here. Thanks to @paulWhite for knowing that existed.

Code Snippets

/* Works as expected -- always four rows */

DECLARE @Rando table
(
    RandomNumber int
);

INSERT INTO
    @Rando
(
    RandomNumber
)
-- This generates 4 random numbers from 1 to 4, endpoints inclusive
SELECT
    1 + ABS(CHECKSUM(NEWID())) % (4) AS RandomNumber
FROM
    sys.databases
WHERE
    database_id <= 4;

SELECT
    *
FROM
    @Rando AS R;

SELECT
    rando.RandomNumber
,   d.database_id
FROM 
    @Rando AS rando
    LEFT JOIN 
        sys.databases d 
        ON rando.RandomNumber = d.database_id
ORDER BY 1,2;


/* Returns a varying number of rows */

SELECT rando.RandomNumber, d.database_id
FROM 
    @Rando AS rando
    INNER JOIN 
        sys.databases d 
        ON rando.RandomNumber = d.database_id
ORDER BY 1,2;

/* Also returns a varying number of rows */

WITH rando AS 
(
    SELECT * FROM @Rando AS rando
)
SELECT r.RandomNumber, d.database_id
FROM 
    rando AS r
    INNER JOIN 
        sys.databases d 
        ON r.RandomNumber = d.database_id
ORDER BY 1,2;

Context

StackExchange Database Administrators Q#80374, answer score: 12

Revisions (0)

No revisions yet.