HiveBrain v1.2.0
Get Started
← Back to all entries
snippetsqlMajor

How to efficiently check EXISTS on multiple columns?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
efficientlyhowcolumnsmultipleexistscheck

Problem

This is an issue I come up against periodically and have not yet found a good solution for.

Supposing the following table structure

CREATE TABLE T
(
A INT PRIMARY KEY,
B CHAR(1000) NULL,
C CHAR(1000) NULL
)


and the requirement is to determine whether either of the nullable columns B or C actually contain any NULL values (and if so which one(s)).

Also assume the table contains millions of rows (and that no column statistics are available that could be peeked at as I am interested in a more generic solution for this class of queries).

I can think of a few ways of approaching this but all have weaknesses.

Two separate EXISTS statements. This would have the advantage of allowing the queries to stop scanning early as soon as a NULL is found. But if both columns in fact contain no NULLs then two full scans will result.

Single Aggregate Query

SELECT 
    MAX(CASE WHEN B IS NULL THEN 1 ELSE 0 END) AS B,
    MAX(CASE WHEN C IS NULL THEN 1 ELSE 0 END) AS C
FROM T


This could process both columns at the same time so have a worst case of one full scan.
The disadvantage is that even if it encounters a NULL in both columns very early on the query will still end up scanning the whole of the rest of the table.

User variables

I can think of a third way of doing this

BEGIN TRY
DECLARE @B INT, @C INT, @D INT

SELECT 
    @B = CASE WHEN B IS NULL THEN 1 ELSE @B END,
    @C = CASE WHEN C IS NULL THEN 1 ELSE @C END,
    /*Divide by zero error if both @B and @C are 1.
    Might happen next row as no guarantee of order of
    assignments*/
    @D = 1 / (2 - (@B + @C))
FROM T  
OPTION (MAXDOP 1)       
END TRY
BEGIN CATCH
IF ERROR_NUMBER() = 8134 /*Divide by zero*/
    BEGIN
    SELECT 'B,C both contain NULLs'
    RETURN;
    END
ELSE
    RETURN;
END CATCH

SELECT ISNULL(@B,0),
       ISNULL(@C,0)


but this is not suitable for production code as the correct behavior for an aggregate concatenation query is undefined. and terminating the

Solution

How about:

SELECT TOP 3 *
FROM (SELECT DISTINCT 
        CASE WHEN B IS NULL THEN NULL ELSE 'foo' END AS B
        , CASE WHEN C IS NULL THEN NULL ELSE 'bar' END AS C
  FROM T 
  WHERE 
    (B IS NULL AND C IS NOT NULL) 
    OR (B IS NOT NULL AND C IS NULL) 
    OR (B IS NULL AND C IS NULL)
) AS DT

Code Snippets

SELECT TOP 3 *
FROM (SELECT DISTINCT 
        CASE WHEN B IS NULL THEN NULL ELSE 'foo' END AS B
        , CASE WHEN C IS NULL THEN NULL ELSE 'bar' END AS C
  FROM T 
  WHERE 
    (B IS NULL AND C IS NOT NULL) 
    OR (B IS NOT NULL AND C IS NULL) 
    OR (B IS NULL AND C IS NULL)
) AS DT

Context

StackExchange Database Administrators Q#19344, answer score: 20

Revisions (0)

No revisions yet.