patternsqlMinor
Hash join between master/detail tables produces too-low cardinality estimate
Viewed 0 times
estimatetablesdetailcardinalitylowjoinhashtoomasterbetween
Problem
When joining a master table to a detail table, how can I encourage SQL Server 2014 to use the cardinality estimate of the larger (detail) table as the cardinality estimate of the join output?
For example, when joining 10K master rows to 100K detail rows, I want SQL Server to estimate the join at 100K rows-- the same as the estimated number of detail rows. How should I structure my queries and/or tables and/or indexes to help SQL Server's estimator leverage the fact that every detail row always has a corresponding master row? (Meaning that a join between them should never reduce the cardinality estimate.)
Here's more details. Our database has a master/detail pair of tables:
The tables look like this: (I'm simplifying to only the relevant columns for this question)
```
-- "master" table
CREATE TABLE VisitTarget
(
VisitTargetId int IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
SaleDate date NOT NULL,
StoreId int NOT NULL
-- other columns omitted for clarity
);
-- covering index for date-scoped queries
CREATE NONCLUSTERED INDEX IX_VisitTarget_SaleDate
ON VisitTarget (SaleDate) INCLUDE (StoreId /, ...more columns /);
-- "detail" table
CREATE TABLE VisitSale
(
VisitSaleId int IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
VisitTargetId int NOT NULL,
SaleDate date NOT NULL, -- denormalized; copied from VisitTarget
StoreId int NOT NULL, -- denormalized; copied from VisitTarget
ItemId int NOT NULL,
SaleQty int NOT NULL,
SalePrice decimal(9,2) NOT NULL
-- other columns omitted for clarity
);
-- covering index for date-scoped queries
CREATE NONCLUSTERED INDEX IX_VisitSale_SaleDate
ON VisitSale (SaleDate)
INCLUDE (VisitTargetId, StoreId, ItemId, SaleQty, TotalSalePrice decimal(9,2) /, ...more columns /
);
ALTER TABLE VisitS
For example, when joining 10K master rows to 100K detail rows, I want SQL Server to estimate the join at 100K rows-- the same as the estimated number of detail rows. How should I structure my queries and/or tables and/or indexes to help SQL Server's estimator leverage the fact that every detail row always has a corresponding master row? (Meaning that a join between them should never reduce the cardinality estimate.)
Here's more details. Our database has a master/detail pair of tables:
VisitTarget has one row for each sales transaction, and VisitSale has one row for each product in each transaction. It's a one-to-many relationship: one VisitTarget row for an average of 10 VisitSale rows. The tables look like this: (I'm simplifying to only the relevant columns for this question)
```
-- "master" table
CREATE TABLE VisitTarget
(
VisitTargetId int IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
SaleDate date NOT NULL,
StoreId int NOT NULL
-- other columns omitted for clarity
);
-- covering index for date-scoped queries
CREATE NONCLUSTERED INDEX IX_VisitTarget_SaleDate
ON VisitTarget (SaleDate) INCLUDE (StoreId /, ...more columns /);
-- "detail" table
CREATE TABLE VisitSale
(
VisitSaleId int IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED,
VisitTargetId int NOT NULL,
SaleDate date NOT NULL, -- denormalized; copied from VisitTarget
StoreId int NOT NULL, -- denormalized; copied from VisitTarget
ItemId int NOT NULL,
SaleQty int NOT NULL,
SalePrice decimal(9,2) NOT NULL
-- other columns omitted for clarity
);
-- covering index for date-scoped queries
CREATE NONCLUSTERED INDEX IX_VisitSale_SaleDate
ON VisitSale (SaleDate)
INCLUDE (VisitTargetId, StoreId, ItemId, SaleQty, TotalSalePrice decimal(9,2) /, ...more columns /
);
ALTER TABLE VisitS
Solution
Assuming that no improvement can be gained by doing something to statistics or using the legacy CE, then the most straightforward way around your problem is to change your
If you have a foreign key between tables, you always filter on the same
It's possible that writing the query in this form won't work well depending on what else happens in the query after the join. You could try using a temp table to hold the intermediate results of the result set with the most important cardinality estimate.
INNER JOIN to a LEFT OUTER JOIN:SELECT vt.StoreId, vt.SomeOtherColumn, Sales = sum(vs.SalePrice*vs.SaleQty)
FROM VisitSale vs
LEFT OUTER JOIN VisitTarget vt on vt.VisitTargetId = vs.VisitTargetId
AND vt.SaleDate BETWEEN '20170101' and '20171231'
WHERE vs.SaleDate BETWEEN '20170101' and '20171231'If you have a foreign key between tables, you always filter on the same
SaleDate range for both tables, and SaleDate always matches between tables then the results of your query should not change. It may seem odd to use an outer join like this, but think of it as informing the query optimizer that the join to the VisitTarget table will never reduce the number of rows returned by the query. Unfortunately, foreign keys do not change cardinality estimates so sometimes you need to resort to tricks like this. (Microsoft Connect suggestion: Make optimizer estimations more accurate by using metadata).It's possible that writing the query in this form won't work well depending on what else happens in the query after the join. You could try using a temp table to hold the intermediate results of the result set with the most important cardinality estimate.
Code Snippets
SELECT vt.StoreId, vt.SomeOtherColumn, Sales = sum(vs.SalePrice*vs.SaleQty)
FROM VisitSale vs
LEFT OUTER JOIN VisitTarget vt on vt.VisitTargetId = vs.VisitTargetId
AND vt.SaleDate BETWEEN '20170101' and '20171231'
WHERE vs.SaleDate BETWEEN '20170101' and '20171231'Context
StackExchange Database Administrators Q#195129, answer score: 6
Revisions (0)
No revisions yet.