patternsqlMinor
Index scan of table with one record with 2.2 billion executions
Viewed 0 times
billionscanwithonerecordexecutionsindextable
Problem
There's something I'm not sure how to tackle with in the query I have.
First, definitions:
Courier Services table. With one record.
Calendar DB and table, code by Genius Jim Horn :
```
CREATE TABLE [dbo].days NULL,
[calendar_month] [tinyint] NULL,
[calendar_month_name_long] varchar NULL,
[calendar_month_name_short] varchar NULL,
[calendar_week_in_year] [tinyint] NULL,
[calendar_week_in_month] [tinyint] NULL,
[calendar_day_in_year] [smallint] NULL,
[calendar_day_in_week] [tinyint] NULL,
[calendar_day_in_month] [tinyint] NULL,
[dmy_name_long] varchar NULL,
[dmy_name_long_with_suffix] varchar NULL,
[day_name_long] varchar NULL,
[day_name_short] varchar NULL,
First, definitions:
Courier Services table. With one record.
CREATE TABLE [dbo].[CS](
[ServiceID] [int] IDENTITY(1,1) NOT NULL,
[CSID] [nvarchar](6) NULL,
[CSDescription] [varchar](50) NULL,
[OperatingDays] [int] NULL,
[DefaultService] [bit] NULL,
CONSTRAINT [CourierServices_PK] PRIMARY KEY CLUSTERED
(
[ServiceID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90
) ON [PRIMARY]
) ON [PRIMARY]
GO
SET IDENTITY_INSERT [dbo].[CS] ON
INSERT [dbo].[CS] ([ServiceID], [CSID], [OperatingDays], [DefaultService])
VALUES (1, N'RM48', 2, 1)
SET IDENTITY_INSERT [dbo].[CS] OFF
SET ANSI_PADDING ON
GO
/****** Object: Index [ix_CourierServices] Script Date: 19/04/2017 14:27:03 ******/
CREATE NONCLUSTERED INDEX [ix_CourierServices] ON [dbo].[CS]
(
[CSID] ASC,
[DefaultService] ASC,
[OperatingDays] ASC
)
INCLUDE ( [CSDescription]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GOCalendar DB and table, code by Genius Jim Horn :
```
CREATE TABLE [dbo].days NULL,
[calendar_month] [tinyint] NULL,
[calendar_month_name_long] varchar NULL,
[calendar_month_name_short] varchar NULL,
[calendar_week_in_year] [tinyint] NULL,
[calendar_week_in_month] [tinyint] NULL,
[calendar_day_in_year] [smallint] NULL,
[calendar_day_in_week] [tinyint] NULL,
[calendar_day_in_month] [tinyint] NULL,
[dmy_name_long] varchar NULL,
[dmy_name_long_with_suffix] varchar NULL,
[day_name_long] varchar NULL,
[day_name_short] varchar NULL,
Solution
Let's start by looking at the top right of the plan. That part calculates the
Since we get back 1.72 M rows for the outer row set we can expect around 1.72 M index seeks against
Note that the index that you have isn't the most efficient one possible for this query. We can only do a seek predicate against
To make the point more clear let's go through a simple example:
Let's get to the more interesting part which is the branch to calculate the
I suspect that what you hoped the optimizer would do is to calculate this as a scalar:
And to use the value of that to do an index seek using
As a general rule you should carefully examine any scans on the inner side of a nested loop. Of course, there are some queries for which that is what you want. And just because you have a seek doesn't mean that the query will be efficient. For example, if a seek returns many rows there may not be that much difference from doing a scan. You didn't post the full query here, but I'll go over a few ideas which could help.
This query is a bit odd:
It is non-deterministic because you have
Now we have all seeks and those seeks shouldn't process too many extra rows. However, the real query may be more complicated than that so perhaps you can't use a variable.
Let's say I write a different filter condition that doesn't use
Here is what the plan might look like:
Now we'll only do around 1.5 million scans against the
Of course, I'm not saying that you should rewrite your code to use that. It'll probably return incorrect results. The important point is that you can get the index seeks that you want with a subquery. You just need to write your subquery in the right way.
For one more example, let's assume that you absolutely need to keep the
OperatingDate column:Since we get back 1.72 M rows for the outer row set we can expect around 1.72 M index seeks against
ix_days. That is indeed what happens. There are 478k rows for which o.[CreationDate] as time) > '16:00:00' so the CASE statement sends 478k seeks to one branch and the rest to the other.Note that the index that you have isn't the most efficient one possible for this query. We can only do a seek predicate against
PKDate. The rest of the filters are applied as a predicate. This means that the seek might traverse many rows before finding a match. I assume that most days in your calendar table aren't weekends or holidays so it may not make a practical difference for this query. However, you could define an index on is_weekend, is_holiday, PKDate. That should let you immediately seek to the first row that you want.To make the point more clear let's go through a simple example:
-- does a scan
SELECT TOP 1 PkDate
FROM [Days]
WHERE is_weekend <> 1 AND is_holiday <> 1
AND PkDate >= '2000-04-01'
ORDER BY PkDate;
-- does a seek, reads 3 rows to return 1
SELECT TOP 1 PkDate
FROM [Days]
WHERE is_weekend = 0 AND is_holiday = 0
AND PkDate >= '2000-04-01'
ORDER BY PkDate;
-- create new index
CREATE NONCLUSTERED INDEX [ix_days_2] ON [dbo].[days]
(
[is_weekend],
[is_holiday],
PkDate
)
-- does a seek, reads 1 row to return 1
SELECT TOP 1 PkDate
FROM [Days]
WHERE is_weekend = 0 AND is_holiday = 0
AND PkDate >= '2000-04-01'
ORDER BY PkDate;
DROP INDEX [days].[ix_days_2];Let's get to the more interesting part which is the branch to calculate the
DeliveryDate column. I'll only include half of it:I suspect that what you hoped the optimizer would do is to calculate this as a scalar:
dateadd(day,isnull(
(select top 1 [operatingdays]
from [dbo].[CS]
where DefaultService = 1)
,2)+1,Cast(o.[CreationDate] as date))And to use the value of that to do an index seek using
ix_days. Unfortunately, the optimizer does not do that. It instead applies a row goal against the index and does a scan. For each row returned from the scan it checks to see if the value matches the filter against [dbo].[CS]. The scan stops as soon as it finds one row that matches. SQL Server estimated that it would only pull back 3.33 rows on average from the scan until it found a match. If that were true then you'd see around 1.5 M executions against [dbo].[CS]. Instead the optimizer did 2 billion executions against the table, so the estimate was off by over 1000 times.As a general rule you should carefully examine any scans on the inner side of a nested loop. Of course, there are some queries for which that is what you want. And just because you have a seek doesn't mean that the query will be efficient. For example, if a seek returns many rows there may not be that much difference from doing a scan. You didn't post the full query here, but I'll go over a few ideas which could help.
This query is a bit odd:
select top 1 [operatingdays]
from [dbo].[CS]
where DefaultService = 1It is non-deterministic because you have
TOP without ORDER BY. However, the table itself has 1 row and you always pull back the same value for each row from o. If possible, I would just try saving off the value of this query into a local variable and using that in the query instead. That should save you a total of 8 billion scans again [dbo].[CS] and I would expect to see an index seek instead of an index scan against ix_days. I was able to mock up some data on my machine. Here is part of the query plan:Now we have all seeks and those seeks shouldn't process too many extra rows. However, the real query may be more complicated than that so perhaps you can't use a variable.
Let's say I write a different filter condition that doesn't use
TOP. Instead I'll use MIN. SQL Server is able to process that subquery in a more efficient way. TOP can prevent certain query transformations. Here is my subquery:WHERE PKDate > dateadd(day,isnull(
(select MIN([operatingdays])
from [dbo].[CS]
where DefaultService = 1)
,2), Cast(o.[CreationDate] as date))Here is what the plan might look like:
Now we'll only do around 1.5 million scans against the
CS table. we also get a much more efficient index seek against the ix_days index which is able to use the results of the subquery:Of course, I'm not saying that you should rewrite your code to use that. It'll probably return incorrect results. The important point is that you can get the index seeks that you want with a subquery. You just need to write your subquery in the right way.
For one more example, let's assume that you absolutely need to keep the
TOP operator in the subquery. It might be possible to adCode Snippets
-- does a scan
SELECT TOP 1 PkDate
FROM [Days]
WHERE is_weekend <> 1 AND is_holiday <> 1
AND PkDate >= '2000-04-01'
ORDER BY PkDate;
-- does a seek, reads 3 rows to return 1
SELECT TOP 1 PkDate
FROM [Days]
WHERE is_weekend = 0 AND is_holiday = 0
AND PkDate >= '2000-04-01'
ORDER BY PkDate;
-- create new index
CREATE NONCLUSTERED INDEX [ix_days_2] ON [dbo].[days]
(
[is_weekend],
[is_holiday],
PkDate
)
-- does a seek, reads 1 row to return 1
SELECT TOP 1 PkDate
FROM [Days]
WHERE is_weekend = 0 AND is_holiday = 0
AND PkDate >= '2000-04-01'
ORDER BY PkDate;
DROP INDEX [days].[ix_days_2];dateadd(day,isnull(
(select top 1 [operatingdays]
from [dbo].[CS]
where DefaultService = 1)
,2)+1,Cast(o.[CreationDate] as date))select top 1 [operatingdays]
from [dbo].[CS]
where DefaultService = 1WHERE PKDate > dateadd(day,isnull(
(select MIN([operatingdays])
from [dbo].[CS]
where DefaultService = 1)
,2), Cast(o.[CreationDate] as date))PKDate > Cast(o.[CreationDate] as date) AND
PKDate > dateadd(day,isnull(
(select top 1 [operatingdays]
from [dbo].[CS]
where DefaultService = 1)
,2)+1,Cast(o.[CreationDate] as date))Context
StackExchange Database Administrators Q#171407, answer score: 6
Revisions (0)
No revisions yet.