HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Interchanging IN with EXISTS produces different result sets

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
resultwithexistsinterchangingdifferentproducessets

Problem

I am attempting to update a query that utilizes the IN operator within a WHERE clause predicate with EXISTS to compare potential performance improvements and better understand what is happening behind the scenes when the two are interchanged. It is my understanding that in practice, the query optimizer treats EXISTS and IN the same way whenever it can.

I'm noticing that when the query is ran with the IN operator, it returns the desired result set. However, when I replace it with the EXISTS equivalent, it pulls in all values from the primary table I want to filter. It is ignoring the provided input values passed to EXISTS (SELECT ... and returning all possible distinct values.

The use case is relatively straightforward: the query accepts a pipe delimited @Series string that can contain up to 4 values, e.g. S1|S2|S3|S4 or S2|S4. From here I string_split the input into a table variable @SeriesSplit to determine the corresponding internal [SeriesId] for the [Series]. The result set returned is then filtered to exclude the [Series] that were not passed.

To illustrate, here is a similar table definition:

DROP TABLE IF EXISTS [dbo].[Document]
IF OBJECT_ID('[dbo].[Document]', 'U') IS NULL
BEGIN
    CREATE TABLE [dbo].[Document] (
        [DocumentId] bigint IDENTITY(1,1) NOT NULL
        ,[DocumentSeriesId] [tinyint] NOT NULL
        ,CONSTRAINT [PK_Document] PRIMARY KEY CLUSTERED ([DocumentId] ASC)
        ,INDEX [IX_Document_SeriesId] NONCLUSTERED ([DocumentSeriesId] ASC)
    );
END;
GO


Populate the test table with dummy data.

```
SET IDENTITY_INSERT [dbo].[Document] ON;
;WITH [DocumentSeed] AS (
SELECT
1 AS [DocumentId]
UNION ALL
SELECT
[DocumentId] + 1
FROM
[DocumentSeed]
WHERE
[DocumentId] < 2048)

INSERT INTO [dbo].[Document] ([DocumentId], [DocumentSeriesId])
SELECT
[DocumentId]
,ABS(CHECKSUM(NEWID()) % 4) + 1
FROM
[DocumentSeed] OPTION (MAXRECURSION 2048);
S

Solution

You didn't rewrite the query properly. When converting an IN subqquery to an EXISTS one, your EXISTS subquery should be correlated. But you defined the Document table again in the subquery, making it uncorrelated.

Your query basically says


Give me all data from Documents if @SeriesSplit has at least one row - and nothing if @SeriesSplit is empty.

It should be:

SELECT DISTINCT
    D1.[DocumentSeriesId]
FROM
    [dbo].[Document] D1
WHERE
    EXISTS (SELECT 1 FROM @SeriesSplit SS 
            WHERE SS.[SeriesId] = D1.[DocumentSeriesId]) ;

Code Snippets

SELECT DISTINCT
    D1.[DocumentSeriesId]
FROM
    [dbo].[Document] D1
WHERE
    EXISTS (SELECT 1 FROM @SeriesSplit SS 
            WHERE SS.[SeriesId] = D1.[DocumentSeriesId]) ;

Context

StackExchange Database Administrators Q#203070, answer score: 5

Revisions (0)

No revisions yet.