patternMinor
Best way to index a table with two datetime columns that are equally queried
Viewed 0 times
equallycolumnswitharewaytwothatqueriedindextable
Problem
Sql Server 2008 R2.
I have a table with ~70m records, about 10 inserts per second.
It's currently clustered on a CreatedAt datetime column, which always increases. 50% of queries involve this column.
There is another datetime column "IssuedAt" which has a different meaning, but is generally within a day or so of the CreatedAt. 50% of queries involve this column.
NC index on this column.
There are a number of other FK columns - probably about 150 bytes wide if that's relevant. And a few more indexes - the table is heavily queried for a variety of reports in different ways.
My question is the best way to index this table with regards to clustered index, and the two datetime columns.
a) I'm concerned the clustered key is bigger than it has to be (datetime + implicit discriminator) leading to bigger NC index size. Should I add an INT identity, cluster on that instead?
b) My queries against the IssuedAt column can be expensive due to bookmark lookups. I'm faced with INCLUDING more and more columns into it (hurting write perf). Is there an alternative tactic here?
Thanks in advance.
UPDATE:
Just for clarity - I'm aware of the need for benchmarking - I can see that some of the queries are not being satisfied satisfactorily.
There is an inherent tension between 50% of the queries relying on datetime column A with a CI, and 50% on datetime column B with an NCI. I was hoping there might be some sort of approach/trick worth considering to offset this tension. For example a new filtered index, or moving to a clustered index which is the date component of both if they match, or some other technique along those lines that others use to relieve this tension.
Second Update:
I'm now considering the following:
-
Create a new column: IssuedAtOffset (int). This is the difference between CreatedAt and IssuedAt, in seconds. I know from business realitites that int is sufficient to capture this delta, and also IssuedAt's milliseconds are irrelevant, so that will
I have a table with ~70m records, about 10 inserts per second.
It's currently clustered on a CreatedAt datetime column, which always increases. 50% of queries involve this column.
There is another datetime column "IssuedAt" which has a different meaning, but is generally within a day or so of the CreatedAt. 50% of queries involve this column.
NC index on this column.
There are a number of other FK columns - probably about 150 bytes wide if that's relevant. And a few more indexes - the table is heavily queried for a variety of reports in different ways.
My question is the best way to index this table with regards to clustered index, and the two datetime columns.
a) I'm concerned the clustered key is bigger than it has to be (datetime + implicit discriminator) leading to bigger NC index size. Should I add an INT identity, cluster on that instead?
b) My queries against the IssuedAt column can be expensive due to bookmark lookups. I'm faced with INCLUDING more and more columns into it (hurting write perf). Is there an alternative tactic here?
Thanks in advance.
UPDATE:
Just for clarity - I'm aware of the need for benchmarking - I can see that some of the queries are not being satisfied satisfactorily.
There is an inherent tension between 50% of the queries relying on datetime column A with a CI, and 50% on datetime column B with an NCI. I was hoping there might be some sort of approach/trick worth considering to offset this tension. For example a new filtered index, or moving to a clustered index which is the date component of both if they match, or some other technique along those lines that others use to relieve this tension.
Second Update:
I'm now considering the following:
-
Create a new column: IssuedAtOffset (int). This is the difference between CreatedAt and IssuedAt, in seconds. I know from business realitites that int is sufficient to capture this delta, and also IssuedAt's milliseconds are irrelevant, so that will
Solution
There is no substitute for benchmarking. To answer the question, I would create and populate several possible tables. Then I would expose these tables to your typical workload, and benchmark.
Including additional columns in you NCIs will slow down modifications and speed up selects. Based on frequency of both, we can choose which approach uses less resources. If a row is on average read twice per year, your conclusions might be different as compared to the case when every row is on average read twice per minute.
Besides, not all queries are born equal. If some queries must complete in certain time no matter what, then you must make sure these requirements are met. Obviously such requirement trump the common good approach described above. Only you can know the actual requirements.
Including additional columns in you NCIs will slow down modifications and speed up selects. Based on frequency of both, we can choose which approach uses less resources. If a row is on average read twice per year, your conclusions might be different as compared to the case when every row is on average read twice per minute.
Besides, not all queries are born equal. If some queries must complete in certain time no matter what, then you must make sure these requirements are met. Obviously such requirement trump the common good approach described above. Only you can know the actual requirements.
Context
StackExchange Database Administrators Q#33411, answer score: 8
Revisions (0)
No revisions yet.