patternsqlMinor
Is the IN() condition always an equality search?
Viewed 0 times
equalitytheconditionsearchalways
Problem
I am trying to understand if the IN() condition is an equality or inequality search when it comes to indexing.
I have been looking on the internet for an answer and have found responses ranging from, yes, it is an equality search, no it's not an equality search and sometimes it is an equality search.
i.e some have advised if there is one value in the condition it is an equality search and if you have over one value it becomes and inequality search, but I have also found answers contradicting that advising it is always equality.
Note: I am using SQL server as the DBMS as I am not sure if it would differ between them.
Edit: I may be getting somethings confused here apologies if so, but when you create an index do you not need to understand if the search that will be completed using that index will be using equality or inequality parameters when you search. So, you would design your index so it will always look for the equality matches first. I am more taking about if the SQL "engine" will treat this as an equality search when it comes to doing a seek on an index. As I have read sometimes it will not.
The SQL server "engine" will treat an IN() operate as an equality unless it thinks it can speed up the query by treating it as a inequality, then looking at the execution plan it may ignore an index that has been setup and will be searched for as an inequality.
Firstly, is this correct? Secondly, I am trying to understand under what circumstance this would happen.
I have been looking on the internet for an answer and have found responses ranging from, yes, it is an equality search, no it's not an equality search and sometimes it is an equality search.
i.e some have advised if there is one value in the condition it is an equality search and if you have over one value it becomes and inequality search, but I have also found answers contradicting that advising it is always equality.
Note: I am using SQL server as the DBMS as I am not sure if it would differ between them.
Edit: I may be getting somethings confused here apologies if so, but when you create an index do you not need to understand if the search that will be completed using that index will be using equality or inequality parameters when you search. So, you would design your index so it will always look for the equality matches first. I am more taking about if the SQL "engine" will treat this as an equality search when it comes to doing a seek on an index. As I have read sometimes it will not.
The SQL server "engine" will treat an IN() operate as an equality unless it thinks it can speed up the query by treating it as a inequality, then looking at the execution plan it may ignore an index that has been setup and will be searched for as an inequality.
Firstly, is this correct? Secondly, I am trying to understand under what circumstance this would happen.
Solution
but and but not
The
Let's say we have this index:
It's pretty well-suited for an equality search on
We can prove that with a query like this:
The query plan will look something like this:
With a seek into our index for a single value.
Now take this query, which uses
The query plan looks quite a bit different now. Even though we still seek into the index with both predicates as equality searches, now we need a Sort operator to support the windowing function.
The Sort operator is putting
The query can be rewritten like this to also use multiple equality predicates and retain the sorting guarantees of the index:
The query plan goes back to how it was before, where no sort operator is needed:
Where I usually see
Instead of searching on
For a little more details, see my post here:
The
IN operator is usually treated like an equality search on an index, but does not always result in the same guarantees as an equality search.Let's say we have this index:
CREATE INDEX
v
ON dbo.Votes
(
VoteTypeId,
PostId,
CreationDate
)
WITH
(
SORT_IN_TEMPDB = ON,
DATA_COMPRESSION = PAGE
);It's pretty well-suited for an equality search on
VoteTypeId, and providing sorted data for PostId and CreationDate.We can prove that with a query like this:
SELECT
x.*
FROM
(
SELECT
r =
ROW_NUMBER() OVER
(
PARTITION BY
v.PostId
ORDER BY
v.CreationDate
)
FROM dbo.Votes AS v
WHERE v.VoteTypeId = 1
) AS x
WHERE x.r = 0;The query plan will look something like this:
With a seek into our index for a single value.
Now take this query, which uses
IN:SELECT
x.*
FROM
(
SELECT
r =
ROW_NUMBER() OVER
(
PARTITION BY
v.PostId
ORDER BY
v.CreationDate
)
FROM dbo.Votes AS v
WHERE v.VoteTypeId IN (1, 2)
) AS x
WHERE x.r = 0;The query plan looks quite a bit different now. Even though we still seek into the index with both predicates as equality searches, now we need a Sort operator to support the windowing function.
The Sort operator is putting
PostId and CreationDate in the order that the ROW_NUMBER windowing function is looking for.The query can be rewritten like this to also use multiple equality predicates and retain the sorting guarantees of the index:
SELECT
x.*
FROM
(
SELECT
r =
ROW_NUMBER() OVER
(
PARTITION BY
v.PostId
ORDER BY
v.CreationDate
)
FROM
(
SELECT
v.*
FROM dbo.Votes AS v
WHERE v.VoteTypeId = 1
UNION ALL
SELECT
v.*
FROM dbo.Votes AS v
WHERE v.VoteTypeId = 2
) AS v
) AS x
WHERE x.r = 0;The query plan goes back to how it was before, where no sort operator is needed:
Where I usually see
IN transformed to multiple OR predicates is when there's no useful index around, anyway. This is not always the case, but horseshoes and hand grenades seem to have a lot in common with query optimizers.SELECT
x.*
FROM
(
SELECT
r =
ROW_NUMBER() OVER
(
PARTITION BY
v.PostId
ORDER BY
v.CreationDate
)
FROM dbo.Votes AS v
WHERE v.UserId IN
(
1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 15, 16
)
) AS x
WHERE x.r = 0;Instead of searching on
VoteTypeId we're searching on UserId, which is not indexed aside from being part of the clustered index, where it's not a key column and therefore not ordered in any way.For a little more details, see my post here:
- How To Choose Key Column Order In SQL Server Indexes
Code Snippets
CREATE INDEX
v
ON dbo.Votes
(
VoteTypeId,
PostId,
CreationDate
)
WITH
(
SORT_IN_TEMPDB = ON,
DATA_COMPRESSION = PAGE
);SELECT
x.*
FROM
(
SELECT
r =
ROW_NUMBER() OVER
(
PARTITION BY
v.PostId
ORDER BY
v.CreationDate
)
FROM dbo.Votes AS v
WHERE v.VoteTypeId = 1
) AS x
WHERE x.r = 0;SELECT
x.*
FROM
(
SELECT
r =
ROW_NUMBER() OVER
(
PARTITION BY
v.PostId
ORDER BY
v.CreationDate
)
FROM dbo.Votes AS v
WHERE v.VoteTypeId IN (1, 2)
) AS x
WHERE x.r = 0;SELECT
x.*
FROM
(
SELECT
r =
ROW_NUMBER() OVER
(
PARTITION BY
v.PostId
ORDER BY
v.CreationDate
)
FROM
(
SELECT
v.*
FROM dbo.Votes AS v
WHERE v.VoteTypeId = 1
UNION ALL
SELECT
v.*
FROM dbo.Votes AS v
WHERE v.VoteTypeId = 2
) AS v
) AS x
WHERE x.r = 0;SELECT
x.*
FROM
(
SELECT
r =
ROW_NUMBER() OVER
(
PARTITION BY
v.PostId
ORDER BY
v.CreationDate
)
FROM dbo.Votes AS v
WHERE v.UserId IN
(
1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11, 12, 13, 15, 16
)
) AS x
WHERE x.r = 0;Context
StackExchange Database Administrators Q#317857, answer score: 8
Revisions (0)
No revisions yet.