HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlModerate

Alternative for WHERE clause in SQL query

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
alternativesqlwherequeryforclause

Problem

We have SQL Server 2012 Enterprise Edition running following query via a stored procedure:

declare @TopX int = 1000
declare @stores Table (Store varchar(5), LastDate datetime, LastId int, RangeEnd datetime)
insert into @stores 
  select * 
  from (select SourceStore, '2014-01-01' as i, null as ii, '2014-01-08' as iii 
        from StoreConfig.dbo.Version 
        group by SourceStore
       ) t 
  where (ABS(CAST((BINARY_CHECKSUM(*) * RAND()) as int)) % 100)  s.StartDate 
     or a.AgreementId > s.LastId
      )
order by isnull(a.ModifiedDate, a.CreatedDate), a.AgreementId


We have following indexes for [Customer].[CustomerAgreement] table:

CREATE NONCLUSTERED INDEX [IX_CustomerAgreement_SourceStore_AgreementId] 
ON [Customer].[CustomerAgreement]
([SourceStore] ASC, [AgreementId] ASC, [IsPrimary] ASC)
INCLUDE ([CustomerId]) 
ON [PRIMARY]
GO


And this is index for [POSREPL3Agreement].[Agreement] table:

CREATE NONCLUSTERED INDEX [IX_Agreement_SourceStore_ModifiedDate]  
ON [POSREPL3Agreement].[Agreement]
([SourceStore] ASC, [ModifiedDate] ASC, [CreatedDate] ASC)
INCLUDE ([AgreementId]) 
ON [PRIMARY]
GO


If we remove the WHERE clause index works as expected and we see 1000 records from both the tables, but when we add the listed WHERE clause [Customer].[CustomerAgreement] estimates all the records instead of 1000.

How can we improve the WHERE clause or index to get [Agreement] table line up with [CustomerAgreement] table so that Estimated Row under [CustomerAgreement] is not ALL the records?

Table definitions

```
CREATE TABLE [Customer].CustomerAgreement NOT NULL,
[SourceStore] varchar NOT NULL,
[RowGUID] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[Repl_ID] [tinyint] NOT NULL,
CONSTRAINT [PK_Customer_CustomerAgreementID_SourceStore] PRIMAR

Solution

If we remove the WHERE clause index works as expected and we see 1000 records from both the tables, but when we add the listed WHERE clause [Customer].[CustomerAgreement] estimates all the records instead of 1000.

The short answer is that without any filtering of rows in the WHERE clause, the query optimizer estimates that it only needs to read 1000 rows from each table to produce the desired result of 1000 rows.

You did not provide an execution plan for this query, so I can't say much more than that. It is likely your query also omitted the ORDER BY clause in this case, otherwise a Sort would likely have been necessary, which would typically entail reading all rows from its subtree.

With the WHERE clause, the optimizer expects many more rows to be read (due to the expecting filtering effect) before the 1000th row is available to return to the client. The difference between the estimated and actual number of rows read is due to the problem of estimating the selectivity of the complex predicates from the available statistics. This is true even assuming the statistics are representative of the data. Fundamentally: it's too hard, and the optimizer estimates cardinality incorrectly.

  1. Indexed computed column solution



You may find the following changes worthwhile:

-
Index the #stores temporary table:

CREATE UNIQUE CLUSTERED INDEX index_name 
ON #stores 
(
    Store,
    StartDate
);


-
Add a computed column to the Agreements table. This uses no storage, and is a very fast metadata-only operation:

ALTER TABLE POSREPL3Agreement.Agreement
ADD ComputedModifiedDate
AS ISNULL(ModifiedDate, CreatedDate);


-
Create (or modify the existing) index to use the computed column. This will satisfy the ORDER BY clause.

CREATE INDEX index_name 
ON POSREPL3Agreement.Agreement
(
    ComputedModifiedDate,
    AgreementId
)
INCLUDE (SourceStore);


-
Simplify the query to reference the computed column directly:

SELECT TOP (1000)
    A.SourceStore,
    A.AgreementId,
    ModifiedDate = A.ComputedModifiedDate
FROM #stores AS S
JOIN POSREPL3Agreement.Agreement AS A
    ON A.SourceStore = S.Store
JOIN Customer.CustomerAgreement AS CA
    ON CA.SourceStore = A.SourceStore
    AND CA.AgreementId = A.AgreementId
WHERE
    CA.IsPrimary = 1
    AND A.ComputedModifiedDate BETWEEN S.StartDate AND S.RangeEnd
    AND 
    ( 
        A.ComputedModifiedDate > S.StartDate
        OR A.AgreementId > S.LastId
    )
ORDER BY
    A.ComputedModifiedDate, 
    A.AgreementId;


If you are unable to rewrite the query as shown, the computed column index will need to include two extra column for technical reasons:

CREATE INDEX index_name 
ON POSREPL3Agreement.Agreement
(
    ComputedModifiedDate,
    AgreementId
)
INCLUDE 
(
    SourceStore,
    CreatedDate,
    ModifiedDate
)
WITH DROP_EXISTING;


The expected execution plan will still show inaccurate estimates (because the optimizer is over-optimistic about how quickly it can stop scanning the ordered computed column index) but the (Top N and Distinct) Sorts are eliminated, and ought to still perform better:

  1. Comprehensive indexed view solution



If you unable to add the computed column and index, you could investigate using an indexed view instead:

CREATE VIEW dbo.ViewName
WITH SCHEMABINDING
AS
SELECT
    A.SourceStore,
    A.AgreementId,
    ComputedModifiedDate = ISNULL(A.ModifiedDate, A.CreatedDate)
FROM POSREPL3Agreement.Agreement AS A
JOIN Customer.CustomerAgreement AS CA
    ON CA.SourceStore = A.SourceStore
    AND CA.AgreementId = A.AgreementId
WHERE
    CA.IsPrimary = 1;
GO
CREATE UNIQUE CLUSTERED INDEX index_name
ON dbo.ViewName
(
    ComputedModifiedDate,
    AgreementId,
    SourceStore
);


The query then becomes:

SELECT TOP (1000)
    VN.SourceStore,
    VN.AgreementId,
    ModifiedDate = VN.ComputedModifiedDate 
FROM #stores AS S
JOIN dbo.ViewName AS VN
    WITH (NOEXPAND)
    ON VN.SourceStore = S.Store
WHERE
    VN.ComputedModifiedDate BETWEEN S.StartDate AND S.RangeEnd
    AND 
    ( 
        VN.ComputedModifiedDate > S.StartDate
        OR VN.AgreementId > S.LastId
    )
ORDER BY
    VN.ComputedModifiedDate, 
    VN.AgreementId;


  1. Simpler indexed view solution



It is also possible to more directly reflect the computed column solution in an indexed view, though this idea does not eliminate the join:

CREATE VIEW dbo.ViewName
WITH SCHEMABINDING
AS
SELECT
    A.SourceStore,
    A.AgreementId,
    ComputedModifiedDate = ISNULL(A.ModifiedDate, A.CreatedDate)
FROM POSREPL3Agreement.Agreement AS A;
GO
CREATE UNIQUE CLUSTERED INDEX index_name
ON dbo.ViewName
(
    ComputedModifiedDate,
    AgreementId,
    SourceStore
);


This time the query becomes:

```
SELECT TOP (1000)
VN.SourceStore,
VN.AgreementId,
ModifiedDate = VN.ComputedModifiedDate
FROM #stores AS S
JOIN dbo.ViewName AS VN
WITH (NOEXPAND)
ON VN.SourceStore = S.Store
JOIN Customer.CustomerAgreement AS CA
ON CA.Source

Code Snippets

CREATE UNIQUE CLUSTERED INDEX index_name 
ON #stores 
(
    Store,
    StartDate
);
ALTER TABLE POSREPL3Agreement.Agreement
ADD ComputedModifiedDate
AS ISNULL(ModifiedDate, CreatedDate);
CREATE INDEX index_name 
ON POSREPL3Agreement.Agreement
(
    ComputedModifiedDate,
    AgreementId
)
INCLUDE (SourceStore);
SELECT TOP (1000)
    A.SourceStore,
    A.AgreementId,
    ModifiedDate = A.ComputedModifiedDate
FROM #stores AS S
JOIN POSREPL3Agreement.Agreement AS A
    ON A.SourceStore = S.Store
JOIN Customer.CustomerAgreement AS CA
    ON CA.SourceStore = A.SourceStore
    AND CA.AgreementId = A.AgreementId
WHERE
    CA.IsPrimary = 1
    AND A.ComputedModifiedDate BETWEEN S.StartDate AND S.RangeEnd
    AND 
    ( 
        A.ComputedModifiedDate > S.StartDate
        OR A.AgreementId > S.LastId
    )
ORDER BY
    A.ComputedModifiedDate, 
    A.AgreementId;
CREATE INDEX index_name 
ON POSREPL3Agreement.Agreement
(
    ComputedModifiedDate,
    AgreementId
)
INCLUDE 
(
    SourceStore,
    CreatedDate,
    ModifiedDate
)
WITH DROP_EXISTING;

Context

StackExchange Database Administrators Q#125579, answer score: 10

Revisions (0)

No revisions yet.