patternsqlMinor
Best Way to Index Multiple Effective Date Columns
Viewed 0 times
columnswaydatemultipleindexeffectivebest
Problem
I've got a table that is primarily to represent relationships between entities (i.e. mostly composed of foreign keys). These relationships change over time, and so the table has a StartDate and an EndDate column. I now need to add another dimension of start and end dates, meaning that the relationship can be viewed with two different date "lenses" (queried with two dates, @Date1 and @Date2), so the schema will look something like this:
The queries will mostly be joins to this table, for example:
My questions are:
MyJoinTable:
| Id | Entity1Id | Entity2Id | StartDate1 | EndDate1 | StartDate2 | EndDate2 |
|----|-----------|-----------|------------|------------|------------|------------|
| 1 | A | B | 1753-01-01 | 2018-09-01 | 1753-01-01 | 2025-01-01 |
| 2 | A | B | 2018-09-01 | 2018-10-01 | 1753-01-01 | 2018-11-01 |
| 3 | A | C | 2018-09-01 | 2018-10-01 | 2018-11-01 | 2025-01-01 |
| 4 | A | B | 2018-10-01 | 2025-01-01 | 1753-01-01 | 2018-11-01 |
| 5 | A | D | 2018-10-01 | 2025-10-01 | 2018-11-01 | 2025-01-01 |The queries will mostly be joins to this table, for example:
SELECT e1.Field, e2.Field
FROM Entity1 e1
INNER JOIN MyJoinTable jt ON jt.Entity1Id = e1.Id
AND StartDate1 @Date1
AND StartDate2 @Date2
INNER JOIN Entity2 e2 ON e2.Id = jt.Entity2IdMy questions are:
- What's the best way to index this join table?
- Index on Entity1Id
- Index on Entity2Id
- Composite Index on all four date columns? (StartDate1, EndDate1, StartDate2, EndDate2)
- What's the best way to constrain the DB so that I only get one relationship row returned for any @Date1, @Date2 combination?
- Do you have any recommendations on a better data model?
Solution
Introduction
I do not have the time for extensive testing, but can suggest from where
to start.
If you rewrite the query in a more symmetrical manner, to
emphasise that both entities are joined to a two-dimensional
cross-section of
you will see that one reasonably efficient way of executing it is first
to extract that cross-section and then to join the entities to it.
The following equivalent query will serve for illustration purposes
(but do not use it!):
It is the more important since you say that there is only
one relationship row returned for any @Date1, @Date2 combination.
For this reason, joining the entities in before
to the required cross-section by
it is likely to have many records for each entity so that the result
will have too many rows. The entites are best joined at the end, using
their natural key field,
The solutions below, therefore, propose different ways of calculating
this cross-section, corresponding to the
above—
Soution I: The easiest
Let us try to add useful indices to the original query. Since MSSQL's
composite indices are hierarchical, they are useless in optimising
interval comparisons, so the best we can do with the given structure is
to index one of the date fields, yet make sure to cover all the other
fields required form the table:
The query then will be executed in the following order:
-
Using the index
to find records by
residual predicate
-
Join the entity tables using their natural key
This method is not very efficient because only one of the four date
predicates is fully optimised by an index, resudual predicates being
boring data grinders.
Solution II: Set intersection
Another symmetrical way to obtain the two-dimensional
cross-section goes through
predicates:
With the suitable indices,
the plan for this query includes only clean index seek (wihout residual
predicates) and hash match operations.
Observe that the entity keys in query and indices present a
fourfold redundancy, which may be removed—at the expense of a more
complicated execution plan—by joining the entities in separately:
with these indices:
But since in either case each of four date predicates, bounding the date
on one end only, does not reduce the amount of data sufficiently, hash
matches may have to
spill
data into
environment.
Solution III: A compromise
Now we can merge solutions I & II in order to come up with a plan that
does not require so much RAM an
I do not have the time for extensive testing, but can suggest from where
to start.
If you rewrite the query in a more symmetrical manner, to
emphasise that both entities are joined to a two-dimensional
cross-section of
MyJoinTable:SELECT E1.Field, E2.Field
FROM MyJoinTable JT
JOIN Entity1 E1 ON E1.Id = JT.Entity1Id
JOIN Entity2 E2 ON E2.Id = JT.Entity2Id
WHERE
StartDate1 @date1 AND
StartDate2 @date2you will see that one reasonably efficient way of executing it is first
to extract that cross-section and then to join the entities to it.
The following equivalent query will serve for illustration purposes
(but do not use it!):
-- A sample to demostrate the desired order or execution:
SELECT E1.Field, E2.Field
FROM
-- 1. Calculate the cross-section:
( SELECT JT.Entity1Id, JT.Entity2Id
FROM MyJoinTable JT
WHERE
StartDate1 @date1 AND
StartDate2 @date2
) SECT
-- 2. Join the entity tables:
JOIN Entity1 E1 ON E1.Id = SECT.Entity1Id
JOIN Entity2 E2 ON E2.Id = SECT.Entity2IdIt is the more important since you say that there is only
one relationship row returned for any @Date1, @Date2 combination.
For this reason, joining the entities in before
MyJoinTable is reducedto the required cross-section by
@date1 and @date2 is inefficient asit is likely to have many records for each entity so that the result
will have too many rows. The entites are best joined at the end, using
their natural key field,
Id, which I assume is the clustered index.The solutions below, therefore, propose different ways of calculating
this cross-section, corresponding to the
SECT subquery of the exampleabove—
Soution I: The easiest
Let us try to add useful indices to the original query. Since MSSQL's
composite indices are hierarchical, they are useless in optimising
interval comparisons, so the best we can do with the given structure is
to index one of the date fields, yet make sure to cover all the other
fields required form the table:
CREATE NONCLUSTERED INDEX SD1 ON MyJoinTable ( StartDate1 )
INCLUDE (Entity1Id, Entity2Id, StartDate2, EndDate1, EndDate2 )The query then will be executed in the following order:
-
Using the index
SD1, perform an index seek on MyJoinTableto find records by
StartDate1, and filter them by StartDate2,EndDate1, and EndDate2 in theresidual predicate
-
Join the entity tables using their natural key
Id.This method is not very efficient because only one of the four date
predicates is fully optimised by an index, resudual predicates being
boring data grinders.
Solution II: Set intersection
Another symmetrical way to obtain the two-dimensional
JOINcross-section goes through
INTERSECTing the results of the four datepredicates:
SELECT Entity1Id, Entity2Id
FROM
( SELECT Id, Entity1Id, Entity2Id FROM MyJoinTable
WHERE @date1 >= StartDate1
INTERSECT
SELECT Id, Entity1Id, Entity2Id FROM MyJoinTable
WHERE @date2 >= StartDate2
INTERSECT
SELECT Id, Entity1Id, Entity2Id FROM MyJoinTable
WHERE @date1 < EndDate1
INTERSECT
SELECT Id, Entity1Id, Entity2Id FROM MyJoinTable
WHERE @date2 < EndDate2 )
SECTWith the suitable indices,
CREATE NONCLUSTERED INDEX SD1 ON MyJoinTable (StartDate1)
INCLUDE (Id, Entity1Id, Entity2Id)
CREATE NONCLUSTERED INDEX SD2 ON MyJoinTable (StartDate2)
INCLUDE (Id, Entity1Id, Entity2Id)
CREATE NONCLUSTERED INDEX ED1 ON MyJoinTable (EndDate1 )
INCLUDE (Id, Entity1Id, Entity2Id)
CREATE NONCLUSTERED INDEX ED2 ON MyJoinTable (EndDate2 )
INCLUDE (Id, Entity1Id, Entity2Id)the plan for this query includes only clean index seek (wihout residual
predicates) and hash match operations.
Observe that the entity keys in query and indices present a
fourfold redundancy, which may be removed—at the expense of a more
complicated execution plan—by joining the entities in separately:
SELECT JT.Entity1Id, JT.Entity2Id
FROM
( SELECT Id FROM MyJoinTable WHERE @date1 >= StartDate1
INTERSECT
SELECT Id FROM MyJoinTable WHERE @date2 >= StartDate2
INTERSECT
SELECT Id FROM MyJoinTable WHERE @date1 < EndDate1
INTERSECT
SELECT Id FROM MyJoinTable WHERE @date2 < EndDate2 )
SECT
JOIN MyJoinTable JT ON JT.Id = SECT.Idwith these indices:
CREATE NONCLUSTERED INDEX SD1 ON MyJoinTable (StartDate1) INCLUDE (Id)
CREATE NONCLUSTERED INDEX SD2 ON MyJoinTable (StartDate2) INCLUDE (Id)
CREATE NONCLUSTERED INDEX ED1 ON MyJoinTable (EndDate1 ) INCLUDE (Id)
CREATE NONCLUSTERED INDEX ED2 ON MyJoinTable (EndDate2 ) INCLUDE (Id)But since in either case each of four date predicates, bounding the date
on one end only, does not reduce the amount of data sufficiently, hash
matches may have to
spill
data into
tempdb. If they do, this method is not fit for yourenvironment.
Solution III: A compromise
Now we can merge solutions I & II in order to come up with a plan that
does not require so much RAM an
Code Snippets
SELECT E1.Field, E2.Field
FROM MyJoinTable JT
JOIN Entity1 E1 ON E1.Id = JT.Entity1Id
JOIN Entity2 E2 ON E2.Id = JT.Entity2Id
WHERE
StartDate1 <= @date1 AND EndDate1 > @date1 AND
StartDate2 <= @date2 AND EndDate2 > @date2-- A sample to demostrate the desired order or execution:
SELECT E1.Field, E2.Field
FROM
-- 1. Calculate the cross-section:
( SELECT JT.Entity1Id, JT.Entity2Id
FROM MyJoinTable JT
WHERE
StartDate1 <= @date1 AND EndDate1 > @date1 AND
StartDate2 <= @date2 AND EndDate2 > @date2
) SECT
-- 2. Join the entity tables:
JOIN Entity1 E1 ON E1.Id = SECT.Entity1Id
JOIN Entity2 E2 ON E2.Id = SECT.Entity2IdCREATE NONCLUSTERED INDEX SD1 ON MyJoinTable ( StartDate1 )
INCLUDE (Entity1Id, Entity2Id, StartDate2, EndDate1, EndDate2 )SELECT Entity1Id, Entity2Id
FROM
( SELECT Id, Entity1Id, Entity2Id FROM MyJoinTable
WHERE @date1 >= StartDate1
INTERSECT
SELECT Id, Entity1Id, Entity2Id FROM MyJoinTable
WHERE @date2 >= StartDate2
INTERSECT
SELECT Id, Entity1Id, Entity2Id FROM MyJoinTable
WHERE @date1 < EndDate1
INTERSECT
SELECT Id, Entity1Id, Entity2Id FROM MyJoinTable
WHERE @date2 < EndDate2 )
SECTCREATE NONCLUSTERED INDEX SD1 ON MyJoinTable (StartDate1)
INCLUDE (Id, Entity1Id, Entity2Id)
CREATE NONCLUSTERED INDEX SD2 ON MyJoinTable (StartDate2)
INCLUDE (Id, Entity1Id, Entity2Id)
CREATE NONCLUSTERED INDEX ED1 ON MyJoinTable (EndDate1 )
INCLUDE (Id, Entity1Id, Entity2Id)
CREATE NONCLUSTERED INDEX ED2 ON MyJoinTable (EndDate2 )
INCLUDE (Id, Entity1Id, Entity2Id)Context
StackExchange Database Administrators Q#220489, answer score: 4
Revisions (0)
No revisions yet.