patternsqlModerate
Is it a good idea to Partition a table (in MS SQL) based on date, when there's a clustered index present on ID (INT)
Viewed 0 times
clusteredpartitionideapresentsqldatebasedgoodindexthere
Problem
I have a table in MS SQL Server .
Table Usage: Logging from the Web Service calls
99.9% is usage is from the logging, developers look rarely into this table in Prod(only when an issue is reported or researching).
Primary Key: Based on the "ID" which is of "INT" data type.
There's a Clustered Index based on that "ID" column.
My Intentions for this change : Want to manage this table (as it has 10 years of data) and going forward (due to a new requirement), there's a possibility of developers/ analysts to dig into this table further (for few months only) and I don't want to create a new table for same purpose.
My Questions:
-
[Main question] Can I partition this table on the basis of "DateCreated" (DATETIME, NOT NULL
column), without causing issues (logically/ performance wise).
-
[Good to know] How much time (I understand it depends upon DB space/ server memory and other details, but a ballpark # would be good) would it take to partition this huge table (If it's ok to
have partition based on date). Asking this question as this is a Production table and rows gets inserted frequently (Now ~ 350 records/ min).
-
[Not exactly a question, but asking for recommendation] Is there a better plan to manage this table(Don't want to keep more
than 3 years of data in Production, plan is mentioned below)?
Current Plan (I am new to MS SQL, so this is what I came up with):
- Table Size: 806 GB
- Rows : 1.2 billion
- Index Space : 1.2 GB
Table Usage: Logging from the Web Service calls
99.9% is usage is from the logging, developers look rarely into this table in Prod(only when an issue is reported or researching).
Primary Key: Based on the "ID" which is of "INT" data type.
There's a Clustered Index based on that "ID" column.
My Intentions for this change : Want to manage this table (as it has 10 years of data) and going forward (due to a new requirement), there's a possibility of developers/ analysts to dig into this table further (for few months only) and I don't want to create a new table for same purpose.
My Questions:
-
[Main question] Can I partition this table on the basis of "DateCreated" (DATETIME, NOT NULL
column), without causing issues (logically/ performance wise).
-
[Good to know] How much time (I understand it depends upon DB space/ server memory and other details, but a ballpark # would be good) would it take to partition this huge table (If it's ok to
have partition based on date). Asking this question as this is a Production table and rows gets inserted frequently (Now ~ 350 records/ min).
-
[Not exactly a question, but asking for recommendation] Is there a better plan to manage this table(Don't want to keep more
than 3 years of data in Production, plan is mentioned below)?
Current Plan (I am new to MS SQL, so this is what I came up with):
- Keep 3 months of data in each partition.
- System to create partitions automatically before each quarter.
- Keep only 3 years worth of partitions in active table.
- Move other partitions to the OLD/ ARCHIEVE table (need to create this). Really OLD data to be purged.
Solution
Short answer: no, you cannot do this. Per the docs...
When partitioning a clustered index, the clustering key must contain the partitioning column.
This means that in order to partition on
This makes a little more sense when you think about what partitioning and clustering actually are under the hood.
-
A clustered index is the logical ordering of your data
-
Partitioning is a technique to manage the physical storage of your data based on a function
These two things can be used together as long as they don't conflict. If you try to make them conflict, you're gonna have a bad time. To answer your point-by-point items:
When partitioning a clustered index, the clustering key must contain the partitioning column.
This means that in order to partition on
[DateCreated], you must also cluster on [DateCreated].This makes a little more sense when you think about what partitioning and clustering actually are under the hood.
-
A clustered index is the logical ordering of your data
-
Partitioning is a technique to manage the physical storage of your data based on a function
These two things can be used together as long as they don't conflict. If you try to make them conflict, you're gonna have a bad time. To answer your point-by-point items:
- You cannot partition on
[DateCreated]at all unless you drop and recreate the clustered index to rebuild the table on that column. This is possible, but it's probably a bigger project than you planned when you asked the question.
- No one can answer this for you. You need to test it for yourself. Several years ago when I dropped and rebuilt a clustered index on a high-traffic OLTP table with an order of magnitude less data-and-index space used by GB than you describe, it took several months of benchmarking which approaches would be safest/fastest and then about 10 hours of off-hours work between 2 DBAs working to complete the rebuild without taking downtime. Addition of a partitioning scheme was not a part of that project but I don't imagine it simplifies things.
- If it's really "a logging table that no one usually cares about", then maybe you can do all of this without worrying about keeping the table online. Almost certainly you don't get to do it without creating at least one extra table (and probably several other objects) too. I encourage you to try a number of techniques on a restored copy of the database. I suspect you will end up renaming the current table and recreating it so that logs can continue to be written to a new, empty table (presumably with the partition scheme) while you archive the hundreds of gigs of old data using whatever approach you choose
Context
StackExchange Database Administrators Q#298411, answer score: 18
Revisions (0)
No revisions yet.