HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Index scan of table with one record with 2.2 billion executions

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
billionscanwithonerecordexecutionsindextable

Problem

There's something I'm not sure how to tackle with in the query I have.

First, definitions:

Courier Services table. With one record.

CREATE TABLE [dbo].[CS](
    [ServiceID] [int] IDENTITY(1,1) NOT NULL,
    [CSID] [nvarchar](6) NULL,
    [CSDescription] [varchar](50) NULL,
    [OperatingDays] [int] NULL,
    [DefaultService] [bit] NULL,
 CONSTRAINT [CourierServices_PK] PRIMARY KEY CLUSTERED 
(
    [ServiceID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF,
       ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, FILLFACTOR = 90
) ON [PRIMARY]
) ON [PRIMARY]

GO
SET IDENTITY_INSERT [dbo].[CS] ON 

INSERT [dbo].[CS] ([ServiceID], [CSID], [OperatingDays], [DefaultService])
           VALUES (1, N'RM48', 2, 1)
SET IDENTITY_INSERT [dbo].[CS] OFF
SET ANSI_PADDING ON

GO
/****** Object:  Index [ix_CourierServices]    Script Date: 19/04/2017 14:27:03 ******/
CREATE NONCLUSTERED INDEX [ix_CourierServices] ON [dbo].[CS]
(
    [CSID] ASC,
    [DefaultService] ASC,
    [OperatingDays] ASC
)
INCLUDE (   [CSDescription]) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF,
SORT_IN_TEMPDB = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON,
ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
GO


Calendar DB and table, code by Genius Jim Horn :

```
CREATE TABLE [dbo].days NULL,
[calendar_month] [tinyint] NULL,
[calendar_month_name_long] varchar NULL,
[calendar_month_name_short] varchar NULL,
[calendar_week_in_year] [tinyint] NULL,
[calendar_week_in_month] [tinyint] NULL,
[calendar_day_in_year] [smallint] NULL,
[calendar_day_in_week] [tinyint] NULL,
[calendar_day_in_month] [tinyint] NULL,
[dmy_name_long] varchar NULL,
[dmy_name_long_with_suffix] varchar NULL,
[day_name_long] varchar NULL,
[day_name_short] varchar NULL,

Solution

Let's start by looking at the top right of the plan. That part calculates the OperatingDate column:

Since we get back 1.72 M rows for the outer row set we can expect around 1.72 M index seeks against ix_days. That is indeed what happens. There are 478k rows for which o.[CreationDate] as time) > '16:00:00' so the CASE statement sends 478k seeks to one branch and the rest to the other.

Note that the index that you have isn't the most efficient one possible for this query. We can only do a seek predicate against PKDate. The rest of the filters are applied as a predicate. This means that the seek might traverse many rows before finding a match. I assume that most days in your calendar table aren't weekends or holidays so it may not make a practical difference for this query. However, you could define an index on is_weekend, is_holiday, PKDate. That should let you immediately seek to the first row that you want.

To make the point more clear let's go through a simple example:

-- does a scan
SELECT TOP 1 PkDate
FROM [Days]
WHERE is_weekend <> 1 AND is_holiday <> 1
AND PkDate >= '2000-04-01'
ORDER BY PkDate;

-- does a seek, reads 3 rows to return 1
SELECT TOP 1 PkDate
FROM [Days]
WHERE is_weekend = 0 AND is_holiday = 0
AND PkDate >= '2000-04-01'
ORDER BY PkDate;

-- create new index
CREATE NONCLUSTERED INDEX [ix_days_2] ON [dbo].[days]
(
    [is_weekend],
    [is_holiday],
    PkDate
)

-- does a seek, reads 1 row to return 1
SELECT TOP 1 PkDate
FROM [Days]
WHERE is_weekend = 0 AND is_holiday = 0
AND PkDate >= '2000-04-01'
ORDER BY PkDate;

DROP INDEX [days].[ix_days_2];


Let's get to the more interesting part which is the branch to calculate the DeliveryDate column. I'll only include half of it:

I suspect that what you hoped the optimizer would do is to calculate this as a scalar:

dateadd(day,isnull(
                  (select top 1 [operatingdays]
                  from [dbo].[CS]
                  where DefaultService = 1)
                 ,2)+1,Cast(o.[CreationDate] as date))


And to use the value of that to do an index seek using ix_days. Unfortunately, the optimizer does not do that. It instead applies a row goal against the index and does a scan. For each row returned from the scan it checks to see if the value matches the filter against [dbo].[CS]. The scan stops as soon as it finds one row that matches. SQL Server estimated that it would only pull back 3.33 rows on average from the scan until it found a match. If that were true then you'd see around 1.5 M executions against [dbo].[CS]. Instead the optimizer did 2 billion executions against the table, so the estimate was off by over 1000 times.

As a general rule you should carefully examine any scans on the inner side of a nested loop. Of course, there are some queries for which that is what you want. And just because you have a seek doesn't mean that the query will be efficient. For example, if a seek returns many rows there may not be that much difference from doing a scan. You didn't post the full query here, but I'll go over a few ideas which could help.

This query is a bit odd:

select top 1 [operatingdays]
from [dbo].[CS]
where DefaultService = 1


It is non-deterministic because you have TOP without ORDER BY. However, the table itself has 1 row and you always pull back the same value for each row from o. If possible, I would just try saving off the value of this query into a local variable and using that in the query instead. That should save you a total of 8 billion scans again [dbo].[CS] and I would expect to see an index seek instead of an index scan against ix_days. I was able to mock up some data on my machine. Here is part of the query plan:

Now we have all seeks and those seeks shouldn't process too many extra rows. However, the real query may be more complicated than that so perhaps you can't use a variable.

Let's say I write a different filter condition that doesn't use TOP. Instead I'll use MIN. SQL Server is able to process that subquery in a more efficient way. TOP can prevent certain query transformations. Here is my subquery:

WHERE PKDate > dateadd(day,isnull(
                      (select MIN([operatingdays])
                       from [dbo].[CS]
                       where DefaultService = 1)
                      ,2), Cast(o.[CreationDate] as date))


Here is what the plan might look like:

Now we'll only do around 1.5 million scans against the CS table. we also get a much more efficient index seek against the ix_days index which is able to use the results of the subquery:

Of course, I'm not saying that you should rewrite your code to use that. It'll probably return incorrect results. The important point is that you can get the index seeks that you want with a subquery. You just need to write your subquery in the right way.

For one more example, let's assume that you absolutely need to keep the TOP operator in the subquery. It might be possible to ad

Code Snippets

-- does a scan
SELECT TOP 1 PkDate
FROM [Days]
WHERE is_weekend <> 1 AND is_holiday <> 1
AND PkDate >= '2000-04-01'
ORDER BY PkDate;

-- does a seek, reads 3 rows to return 1
SELECT TOP 1 PkDate
FROM [Days]
WHERE is_weekend = 0 AND is_holiday = 0
AND PkDate >= '2000-04-01'
ORDER BY PkDate;

-- create new index
CREATE NONCLUSTERED INDEX [ix_days_2] ON [dbo].[days]
(
    [is_weekend],
    [is_holiday],
    PkDate
)

-- does a seek, reads 1 row to return 1
SELECT TOP 1 PkDate
FROM [Days]
WHERE is_weekend = 0 AND is_holiday = 0
AND PkDate >= '2000-04-01'
ORDER BY PkDate;

DROP INDEX [days].[ix_days_2];
dateadd(day,isnull(
                  (select top 1 [operatingdays]
                  from [dbo].[CS]
                  where DefaultService = 1)
                 ,2)+1,Cast(o.[CreationDate] as date))
select top 1 [operatingdays]
from [dbo].[CS]
where DefaultService = 1
WHERE PKDate > dateadd(day,isnull(
                      (select MIN([operatingdays])
                       from [dbo].[CS]
                       where DefaultService = 1)
                      ,2), Cast(o.[CreationDate] as date))
PKDate > Cast(o.[CreationDate] as date) AND 
  PKDate > dateadd(day,isnull(
      (select top 1 [operatingdays]
      from [dbo].[CS]
      where DefaultService = 1)
     ,2)+1,Cast(o.[CreationDate] as date))

Context

StackExchange Database Administrators Q#171407, answer score: 6

Revisions (0)

No revisions yet.