patternsqlMinor
Deciding between an IsDeleted/DateDeleted column or a separate History table?
Viewed 0 times
isdeletedcolumnhistoryseparatebetweendatedeleteddecidingtable
Problem
We have a table that stores nothing but an
I've been asked to start recording tag history, and I'm trying to decide if it would be better to use an
The primary purpose of tracking the history would be to run reports on a specific point in time. For example, getting data on accounts that had TagA on them during January 2013.
What factors should I consider when trying to decide between the two?
AccountId, TagId, and DateCreated. Users can update the Tags on accounts, which adds or remove records from this table. No tag can exist more than once on an account, but the same tag can be added and removed multiple times. This table is primarily used in queries users build to filter accounts using either JOIN or EXISTS on the tag table.I've been asked to start recording tag history, and I'm trying to decide if it would be better to use an
IsDeleted and/or DateDeleted column on the existing table, or if I should create a separate history table that stores this data instead.The primary purpose of tracking the history would be to run reports on a specific point in time. For example, getting data on accounts that had TagA on them during January 2013.
What factors should I consider when trying to decide between the two?
Solution
The problem with
A separate table for history makes a much more cleaner cut from the engine point of view: is a different table, no messy/risky plan choices to cope with. But it is much more impact on the application. If looking up the history of a row is the exception, the application changes are reasonable easy then I would consider this option first, as it has the least risk of (severe) performance regression.
I would strongly advise against mixed solutions (eg. tags and tags_history table and a view that unions them), such solutions have all the disadvantages mentioned above without any advantage. In general, anything that claims 'minimal impact' or 'transparent change' is snake oil. This (adding application layer row versioning/row history) is, by definition, a high impact change which will ripple big time in the application.
IsDeleted is its low cadinality: 0 or 1. Which makes it unindexable and since all your queries will add IsDeleted = 0 to the WHERE (or JOIN) conditions this apparent trivial change complicates a lot of things. You covering indexes will all of the sudden require lookups into the base table to decide if the row qualifies or not. There are measures that can improve behavior, but they have to be properly considered and consistently deployed (note that some of these mitigation are mutually exclusive, evaluate them separately):- add
IsDeletedas the leftmost key in the clustered index. This effectively 'splits' the clustered table into two regions ('deleted' and 'active'). It also forces de-facto theIsDeletedcolumn in every non-clustered index, thus avoiding the need to lookup the value.
- add
IsDeletedas a leftmost key in every non-clustered index. May sound extreme, but can be quite effective, provided that all queries specifyIsDeletedin the WHERE.
- add
IsDeletedas an included column to every non-clustered index. Unlike the options above this does not change seek/range/lookup semantics of the exiting indexes, but makes theIsDeletedcheck filtering available in every access path. Can be quite effective specially ifIsDeleted = 1occurs seldom.
- make the non-clustered indexes filtered by
IsDeleted = 0. May be effective on large tables, but is a bit dangerous as the filtered indexes become unusable for queries interested in the deleted rows.
A separate table for history makes a much more cleaner cut from the engine point of view: is a different table, no messy/risky plan choices to cope with. But it is much more impact on the application. If looking up the history of a row is the exception, the application changes are reasonable easy then I would consider this option first, as it has the least risk of (severe) performance regression.
I would strongly advise against mixed solutions (eg. tags and tags_history table and a view that unions them), such solutions have all the disadvantages mentioned above without any advantage. In general, anything that claims 'minimal impact' or 'transparent change' is snake oil. This (adding application layer row versioning/row history) is, by definition, a high impact change which will ripple big time in the application.
Context
StackExchange Database Administrators Q#34233, answer score: 9
Revisions (0)
No revisions yet.