HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Redundant entries in an IN clause

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
redundantclauseentries

Problem

I have a dynamically built query with a where clause that looks like this:

where column in ('a', 'a', 'a', 'b')

The actual data is a LOT longer with a lot of duplicates, as in the example above where a is repeated three times.

Is there a potential performance hit for a very large WHERE clause using an IN with a large number of values, many of which are duplicates?

Solution

This is pretty easy to test.

USE tempdb;

CREATE TABLE dbo.d
(
    col varchar(1)
);

INSERT INTO dbo.d (col)
VALUES ('a')
    , ('b');
GO


Enable "actual" execution plans, and run this:

SELECT *
FROM dbo.d
WHERE d.col IN ('a', 'b', 'a', 'b', 'c');


The results pretty clearly show that SQL Server eliminates duplicate items from the IN (...) clause before executing the query.

Be aware, if the IN (...) contains a sufficiently large number of duplicated items, the query may perform more poorly than if those duplicates weren't present. Causes for the poor performance might include:

  • The client-end might have difficulty compiling the list of items for the IN clause.



  • The time spent by the query optimizer will obviously increase somewhat with more items in the IN clause.



  • If the query is being sent over the network, a long list of items in the IN clause may result in higher latency if multiple TCP packets need to be compiled, sent, and received.



The items listed above will be compounded if the query is being sent to the server very frequently.

If the list is big enough, the size of the text alone can cause performance issues in other ways, such as compilation time (just parsing the thing) and sending the query over the wire. So just because the execution plan doesn't show it, doesn't mean those additional values can't have an effect on performance.

Code Snippets

USE tempdb;

CREATE TABLE dbo.d
(
    col varchar(1)
);

INSERT INTO dbo.d (col)
VALUES ('a')
    , ('b');
GO
SELECT *
FROM dbo.d
WHERE d.col IN ('a', 'b', 'a', 'b', 'c');

Context

StackExchange Database Administrators Q#197929, answer score: 12

Revisions (0)

No revisions yet.