patternsqlModerate
Redundant entries in an IN clause
Viewed 0 times
redundantclauseentries
Problem
I have a dynamically built query with a where clause that looks like this:
The actual data is a LOT longer with a lot of duplicates, as in the example above where
Is there a potential performance hit for a very large
where column in ('a', 'a', 'a', 'b')The actual data is a LOT longer with a lot of duplicates, as in the example above where
a is repeated three times.Is there a potential performance hit for a very large
WHERE clause using an IN with a large number of values, many of which are duplicates?Solution
This is pretty easy to test.
Enable "actual" execution plans, and run this:
The results pretty clearly show that SQL Server eliminates duplicate items from the
Be aware, if the
The items listed above will be compounded if the query is being sent to the server very frequently.
If the list is big enough, the size of the text alone can cause performance issues in other ways, such as compilation time (just parsing the thing) and sending the query over the wire. So just because the execution plan doesn't show it, doesn't mean those additional values can't have an effect on performance.
USE tempdb;
CREATE TABLE dbo.d
(
col varchar(1)
);
INSERT INTO dbo.d (col)
VALUES ('a')
, ('b');
GOEnable "actual" execution plans, and run this:
SELECT *
FROM dbo.d
WHERE d.col IN ('a', 'b', 'a', 'b', 'c');The results pretty clearly show that SQL Server eliminates duplicate items from the
IN (...) clause before executing the query.Be aware, if the
IN (...) contains a sufficiently large number of duplicated items, the query may perform more poorly than if those duplicates weren't present. Causes for the poor performance might include:- The client-end might have difficulty compiling the list of items for the
INclause.
- The time spent by the query optimizer will obviously increase somewhat with more items in the
INclause.
- If the query is being sent over the network, a long list of items in the
INclause may result in higher latency if multiple TCP packets need to be compiled, sent, and received.
The items listed above will be compounded if the query is being sent to the server very frequently.
If the list is big enough, the size of the text alone can cause performance issues in other ways, such as compilation time (just parsing the thing) and sending the query over the wire. So just because the execution plan doesn't show it, doesn't mean those additional values can't have an effect on performance.
Code Snippets
USE tempdb;
CREATE TABLE dbo.d
(
col varchar(1)
);
INSERT INTO dbo.d (col)
VALUES ('a')
, ('b');
GOSELECT *
FROM dbo.d
WHERE d.col IN ('a', 'b', 'a', 'b', 'c');Context
StackExchange Database Administrators Q#197929, answer score: 12
Revisions (0)
No revisions yet.