patternsqlModerate
Alternative for WHERE clause in SQL query
Viewed 0 times
alternativesqlwherequeryforclause
Problem
We have SQL Server 2012 Enterprise Edition running following query via a stored procedure:
We have following indexes for
And this is index for
If we remove the
How can we improve the
Table definitions
```
CREATE TABLE [Customer].CustomerAgreement NOT NULL,
[SourceStore] varchar NOT NULL,
[RowGUID] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[Repl_ID] [tinyint] NOT NULL,
CONSTRAINT [PK_Customer_CustomerAgreementID_SourceStore] PRIMAR
declare @TopX int = 1000
declare @stores Table (Store varchar(5), LastDate datetime, LastId int, RangeEnd datetime)
insert into @stores
select *
from (select SourceStore, '2014-01-01' as i, null as ii, '2014-01-08' as iii
from StoreConfig.dbo.Version
group by SourceStore
) t
where (ABS(CAST((BINARY_CHECKSUM(*) * RAND()) as int)) % 100) s.StartDate
or a.AgreementId > s.LastId
)
order by isnull(a.ModifiedDate, a.CreatedDate), a.AgreementIdWe have following indexes for
[Customer].[CustomerAgreement] table:CREATE NONCLUSTERED INDEX [IX_CustomerAgreement_SourceStore_AgreementId]
ON [Customer].[CustomerAgreement]
([SourceStore] ASC, [AgreementId] ASC, [IsPrimary] ASC)
INCLUDE ([CustomerId])
ON [PRIMARY]
GOAnd this is index for
[POSREPL3Agreement].[Agreement] table:CREATE NONCLUSTERED INDEX [IX_Agreement_SourceStore_ModifiedDate]
ON [POSREPL3Agreement].[Agreement]
([SourceStore] ASC, [ModifiedDate] ASC, [CreatedDate] ASC)
INCLUDE ([AgreementId])
ON [PRIMARY]
GOIf we remove the
WHERE clause index works as expected and we see 1000 records from both the tables, but when we add the listed WHERE clause [Customer].[CustomerAgreement] estimates all the records instead of 1000. How can we improve the
WHERE clause or index to get [Agreement] table line up with [CustomerAgreement] table so that Estimated Row under [CustomerAgreement] is not ALL the records?Table definitions
```
CREATE TABLE [Customer].CustomerAgreement NOT NULL,
[SourceStore] varchar NOT NULL,
[RowGUID] [uniqueidentifier] ROWGUIDCOL NOT NULL,
[Repl_ID] [tinyint] NOT NULL,
CONSTRAINT [PK_Customer_CustomerAgreementID_SourceStore] PRIMAR
Solution
If we remove the
The short answer is that without any filtering of rows in the
You did not provide an execution plan for this query, so I can't say much more than that. It is likely your query also omitted the
With the
You may find the following changes worthwhile:
-
Index the #stores temporary table:
-
Add a computed column to the Agreements table. This uses no storage, and is a very fast metadata-only operation:
-
Create (or modify the existing) index to use the computed column. This will satisfy the
-
Simplify the query to reference the computed column directly:
If you are unable to rewrite the query as shown, the computed column index will need to include two extra column for technical reasons:
The expected execution plan will still show inaccurate estimates (because the optimizer is over-optimistic about how quickly it can stop scanning the ordered computed column index) but the (Top N and Distinct) Sorts are eliminated, and ought to still perform better:
If you unable to add the computed column and index, you could investigate using an indexed view instead:
The query then becomes:
It is also possible to more directly reflect the computed column solution in an indexed view, though this idea does not eliminate the join:
This time the query becomes:
```
SELECT TOP (1000)
VN.SourceStore,
VN.AgreementId,
ModifiedDate = VN.ComputedModifiedDate
FROM #stores AS S
JOIN dbo.ViewName AS VN
WITH (NOEXPAND)
ON VN.SourceStore = S.Store
JOIN Customer.CustomerAgreement AS CA
ON CA.Source
WHERE clause index works as expected and we see 1000 records from both the tables, but when we add the listed WHERE clause [Customer].[CustomerAgreement] estimates all the records instead of 1000.The short answer is that without any filtering of rows in the
WHERE clause, the query optimizer estimates that it only needs to read 1000 rows from each table to produce the desired result of 1000 rows.You did not provide an execution plan for this query, so I can't say much more than that. It is likely your query also omitted the
ORDER BY clause in this case, otherwise a Sort would likely have been necessary, which would typically entail reading all rows from its subtree.With the
WHERE clause, the optimizer expects many more rows to be read (due to the expecting filtering effect) before the 1000th row is available to return to the client. The difference between the estimated and actual number of rows read is due to the problem of estimating the selectivity of the complex predicates from the available statistics. This is true even assuming the statistics are representative of the data. Fundamentally: it's too hard, and the optimizer estimates cardinality incorrectly.- Indexed computed column solution
You may find the following changes worthwhile:
-
Index the #stores temporary table:
CREATE UNIQUE CLUSTERED INDEX index_name
ON #stores
(
Store,
StartDate
);-
Add a computed column to the Agreements table. This uses no storage, and is a very fast metadata-only operation:
ALTER TABLE POSREPL3Agreement.Agreement
ADD ComputedModifiedDate
AS ISNULL(ModifiedDate, CreatedDate);-
Create (or modify the existing) index to use the computed column. This will satisfy the
ORDER BY clause.CREATE INDEX index_name
ON POSREPL3Agreement.Agreement
(
ComputedModifiedDate,
AgreementId
)
INCLUDE (SourceStore);-
Simplify the query to reference the computed column directly:
SELECT TOP (1000)
A.SourceStore,
A.AgreementId,
ModifiedDate = A.ComputedModifiedDate
FROM #stores AS S
JOIN POSREPL3Agreement.Agreement AS A
ON A.SourceStore = S.Store
JOIN Customer.CustomerAgreement AS CA
ON CA.SourceStore = A.SourceStore
AND CA.AgreementId = A.AgreementId
WHERE
CA.IsPrimary = 1
AND A.ComputedModifiedDate BETWEEN S.StartDate AND S.RangeEnd
AND
(
A.ComputedModifiedDate > S.StartDate
OR A.AgreementId > S.LastId
)
ORDER BY
A.ComputedModifiedDate,
A.AgreementId;If you are unable to rewrite the query as shown, the computed column index will need to include two extra column for technical reasons:
CREATE INDEX index_name
ON POSREPL3Agreement.Agreement
(
ComputedModifiedDate,
AgreementId
)
INCLUDE
(
SourceStore,
CreatedDate,
ModifiedDate
)
WITH DROP_EXISTING;The expected execution plan will still show inaccurate estimates (because the optimizer is over-optimistic about how quickly it can stop scanning the ordered computed column index) but the (Top N and Distinct) Sorts are eliminated, and ought to still perform better:
- Comprehensive indexed view solution
If you unable to add the computed column and index, you could investigate using an indexed view instead:
CREATE VIEW dbo.ViewName
WITH SCHEMABINDING
AS
SELECT
A.SourceStore,
A.AgreementId,
ComputedModifiedDate = ISNULL(A.ModifiedDate, A.CreatedDate)
FROM POSREPL3Agreement.Agreement AS A
JOIN Customer.CustomerAgreement AS CA
ON CA.SourceStore = A.SourceStore
AND CA.AgreementId = A.AgreementId
WHERE
CA.IsPrimary = 1;
GO
CREATE UNIQUE CLUSTERED INDEX index_name
ON dbo.ViewName
(
ComputedModifiedDate,
AgreementId,
SourceStore
);The query then becomes:
SELECT TOP (1000)
VN.SourceStore,
VN.AgreementId,
ModifiedDate = VN.ComputedModifiedDate
FROM #stores AS S
JOIN dbo.ViewName AS VN
WITH (NOEXPAND)
ON VN.SourceStore = S.Store
WHERE
VN.ComputedModifiedDate BETWEEN S.StartDate AND S.RangeEnd
AND
(
VN.ComputedModifiedDate > S.StartDate
OR VN.AgreementId > S.LastId
)
ORDER BY
VN.ComputedModifiedDate,
VN.AgreementId;- Simpler indexed view solution
It is also possible to more directly reflect the computed column solution in an indexed view, though this idea does not eliminate the join:
CREATE VIEW dbo.ViewName
WITH SCHEMABINDING
AS
SELECT
A.SourceStore,
A.AgreementId,
ComputedModifiedDate = ISNULL(A.ModifiedDate, A.CreatedDate)
FROM POSREPL3Agreement.Agreement AS A;
GO
CREATE UNIQUE CLUSTERED INDEX index_name
ON dbo.ViewName
(
ComputedModifiedDate,
AgreementId,
SourceStore
);This time the query becomes:
```
SELECT TOP (1000)
VN.SourceStore,
VN.AgreementId,
ModifiedDate = VN.ComputedModifiedDate
FROM #stores AS S
JOIN dbo.ViewName AS VN
WITH (NOEXPAND)
ON VN.SourceStore = S.Store
JOIN Customer.CustomerAgreement AS CA
ON CA.Source
Code Snippets
CREATE UNIQUE CLUSTERED INDEX index_name
ON #stores
(
Store,
StartDate
);ALTER TABLE POSREPL3Agreement.Agreement
ADD ComputedModifiedDate
AS ISNULL(ModifiedDate, CreatedDate);CREATE INDEX index_name
ON POSREPL3Agreement.Agreement
(
ComputedModifiedDate,
AgreementId
)
INCLUDE (SourceStore);SELECT TOP (1000)
A.SourceStore,
A.AgreementId,
ModifiedDate = A.ComputedModifiedDate
FROM #stores AS S
JOIN POSREPL3Agreement.Agreement AS A
ON A.SourceStore = S.Store
JOIN Customer.CustomerAgreement AS CA
ON CA.SourceStore = A.SourceStore
AND CA.AgreementId = A.AgreementId
WHERE
CA.IsPrimary = 1
AND A.ComputedModifiedDate BETWEEN S.StartDate AND S.RangeEnd
AND
(
A.ComputedModifiedDate > S.StartDate
OR A.AgreementId > S.LastId
)
ORDER BY
A.ComputedModifiedDate,
A.AgreementId;CREATE INDEX index_name
ON POSREPL3Agreement.Agreement
(
ComputedModifiedDate,
AgreementId
)
INCLUDE
(
SourceStore,
CreatedDate,
ModifiedDate
)
WITH DROP_EXISTING;Context
StackExchange Database Administrators Q#125579, answer score: 10
Revisions (0)
No revisions yet.