patternsqlMinor
Bad performance using "NOT IN"
Viewed 0 times
usingnotbadperformance
Problem
In my application, I have a query which performs a search in "files" table.
The table
I'm using this query from the answer to my previous question, Slow order by SQL Server:
```
WITH PartitionNumbers AS
(
-- Each partition of the table
SELECT P.partition_number
FROM sys.partitions AS P
WHERE P.[object_id] = OBJECT_ID(N'dbo.files', N'U')
AND P.index_id = 1
)
SELECT
FF.id,
FF.[name],
FF.[year],
FF.cid,
FF.created,
vnVE0.keywordValueCol0_numeric
FROM PartitionNumbers AS PN
CROSS APPLY
(
SELECT
F100.*
FROM
(
-- 50 rows in order for year 2013
SELECT
F.id,
F.[name],
F.[year],
F.cid,
F.created
FROM dbo.files AS F
WHERE
F.grapado IS NULL
AND F.masterversion IS NULL
AND F.[year] = 2013
AND F.cid = 19
AND F.eid = 8
AND $PARTITION.PF_files_partitioning(F.created) = PN.partition_number
ORDER BY
F.[name]
OFFSET 0 ROWS
FETCH FIRST 50 ROWS ONLY
UNION ALL
-- 50 rows in order for year 0
SELECT
F.id,
F.[name],
F.[year],
F.cid,
F.created
FROM dbo.files AS F
WHERE
F.grapado IS NULL
AND F.masterversion IS NULL
AND F.[year] = 0
AND F.cid = 19
AND F.eid = 8
AND $PARTITION.PF_files_partitioning(F.created) = PN.partition_number
ORDER BY
F.[name]
OFFSET 0 ROWS
FETCH FIRST 50 ROWS ONLY
) AS F100
) AS FF
OUTER APPLY
(
-- Lookup distinct values
SELECT
keywordValueCol0_numeric =
CASE
WHEN VN.[value] IS NOT NULL AND VN.[value] <> ''
The table
files is partitioned by f.created (see the table definition) and has ~100 million rows for the client 19 (f.cid = 19).I'm using this query from the answer to my previous question, Slow order by SQL Server:
```
WITH PartitionNumbers AS
(
-- Each partition of the table
SELECT P.partition_number
FROM sys.partitions AS P
WHERE P.[object_id] = OBJECT_ID(N'dbo.files', N'U')
AND P.index_id = 1
)
SELECT
FF.id,
FF.[name],
FF.[year],
FF.cid,
FF.created,
vnVE0.keywordValueCol0_numeric
FROM PartitionNumbers AS PN
CROSS APPLY
(
SELECT
F100.*
FROM
(
-- 50 rows in order for year 2013
SELECT
F.id,
F.[name],
F.[year],
F.cid,
F.created
FROM dbo.files AS F
WHERE
F.grapado IS NULL
AND F.masterversion IS NULL
AND F.[year] = 2013
AND F.cid = 19
AND F.eid = 8
AND $PARTITION.PF_files_partitioning(F.created) = PN.partition_number
ORDER BY
F.[name]
OFFSET 0 ROWS
FETCH FIRST 50 ROWS ONLY
UNION ALL
-- 50 rows in order for year 0
SELECT
F.id,
F.[name],
F.[year],
F.cid,
F.created
FROM dbo.files AS F
WHERE
F.grapado IS NULL
AND F.masterversion IS NULL
AND F.[year] = 0
AND F.cid = 19
AND F.eid = 8
AND $PARTITION.PF_files_partitioning(F.created) = PN.partition_number
ORDER BY
F.[name]
OFFSET 0 ROWS
FETCH FIRST 50 ROWS ONLY
) AS F100
) AS FF
OUTER APPLY
(
-- Lookup distinct values
SELECT
keywordValueCol0_numeric =
CASE
WHEN VN.[value] IS NOT NULL AND VN.[value] <> ''
Solution
The performance difference is because you have an index that's suited to the first query (
You don't have these indexes defined in the question. You have a definition for this one:
I will assume that the index definitions follow the same pattern based on their names.
Let's start with the fast query that uses the index with key columns of
For the slow query, the filter on
Without an equality filter on the third key column, the engine doesn't support taking advantage of the ordered nature of the index to avoid a sort on the fourth column (
Instead, the query optimizer picks an index with key columns of
That should avoid the key lookup and improve performance, but it's possible that you'll do some residual IO because you can't immediately seek to the first 50 rows that match all of the filters.
You might be wondering why SQL Server choose such a poorly performing plan or why a plan with an estimated cost of just 0.777646 optimizer units takes over 10 minutes to execute. The answer has to do with row goals. Let's look at the key lookup in the slow plan:
The query optimizer thinks that it will need to do 151 key lookups before it finds 50 rows that satisfy
"f"."eid" = 8) but not for the second one (F.eid NOT IN (8,10)). One thing that was very confusing to me was that the queries actually use different indexes:[dd_produccion_test2].[dbo].[files].[IX dbo.files cid, year, name : grapado IS NULL AND masterversion IS NULL]
[dd_produccion_test2].[dbo].[files].[IX dbo.files cid, year, eid, name : grapado IS NULL AND masterversion IS NULL] [F]You don't have these indexes defined in the question. You have a definition for this one:
CREATE NONCLUSTERED INDEX [IX dbo.files cid, year, eid : grapado IS NULL AND masterversion IS NULL] ON [dbo].[archivos]
(
[cid] ASC,
[year] ASC,
[eid] ASC
)
INCLUDE ( [grapado],
[masterversion])
WHERE ([grapado] IS NULL AND [masterversion] IS NULL)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])I will assume that the index definitions follow the same pattern based on their names.
Let's start with the fast query that uses the index with key columns of
cid, year, id, and name. Your fast query has equality predicates on the first three key columns and an ORDER BY defined on the fourth key column. That allows you to seek directly to the rows that you want. There is no residual IO. You only need to read 50 rows from the index and the fourth key column means that the data is already ordered in the way that you want it. All in all the query probably does something like a few thousand index seeks so it's not surprising that it's fast.For the slow query, the filter on
eid is no longer an equality predicate. This is a very important difference. The index with key columns on cid, year, eid, name is still covering, but using it looks like this:- first key column: equality filter
- second key column: equality filter
- third key column: not equal to filter
- fourth key column:
ORDER BY
Without an equality filter on the third key column, the engine doesn't support taking advantage of the ordered nature of the index to avoid a sort on the fourth column (
name). If the query optimizer used this index for this query it would need to read all matching rows from the index and sort them to find the first 50 rows. That type of operation can have a large estimated cost.Instead, the query optimizer picks an index with key columns of
cid, year, and name. With equality filters on the first two key columns, SQL Server is able to take advantage of the ordered nature of the index to avoid an explicit sort by name. However, the eid column is not present in this index. This leads to the order-preserving key lookup to filter on eid, which is almost certainly the cause of your performance issue. I would try adding eid as an included column to [dd_produccion_test2].[dbo].[files].[IX dbo.files cid, year, name : grapado IS NULL AND masterversion IS NULL]: CREATE NONCLUSTERED INDEX
[IX dbo.files cid, year, name : grapado IS NULL AND masterversion IS NULL]
ON [dbo].[files]
( cid, year, name )
INCLUDE ( eid )
WHERE ( grapado IS NULL AND masterversion IS NULL )
... ;That should avoid the key lookup and improve performance, but it's possible that you'll do some residual IO because you can't immediately seek to the first 50 rows that match all of the filters.
You might be wondering why SQL Server choose such a poorly performing plan or why a plan with an estimated cost of just 0.777646 optimizer units takes over 10 minutes to execute. The answer has to do with row goals. Let's look at the key lookup in the slow plan:
The query optimizer thinks that it will need to do 151 key lookups before it finds 50 rows that satisfy
F.eid NOT IN (8,10). Perhaps your statistics are out of date (there are warnings on some columns in your plan) or it is making an overly optimistic assumption. I suspect that it needs to do many more seeks than 151. You can try disabling row goals in the plan as a test, but I suspect that you won't see good performance without some kind of change in your index definitions.Code Snippets
[dd_produccion_test2].[dbo].[files].[IX dbo.files cid, year, name : grapado IS NULL AND masterversion IS NULL]
[dd_produccion_test2].[dbo].[files].[IX dbo.files cid, year, eid, name : grapado IS NULL AND masterversion IS NULL] [F]CREATE NONCLUSTERED INDEX [IX dbo.files cid, year, eid : grapado IS NULL AND masterversion IS NULL] ON [dbo].[archivos]
(
[cid] ASC,
[year] ASC,
[eid] ASC
)
INCLUDE ( [grapado],
[masterversion])
WHERE ([grapado] IS NULL AND [masterversion] IS NULL)
WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON)
ON PS_files_partitioning([created])CREATE NONCLUSTERED INDEX
[IX dbo.files cid, year, name : grapado IS NULL AND masterversion IS NULL]
ON [dbo].[files]
( cid, year, name )
INCLUDE ( eid )
WHERE ( grapado IS NULL AND masterversion IS NULL )
... ;Context
StackExchange Database Administrators Q#202707, answer score: 7
Revisions (0)
No revisions yet.