patternModerate
Is it bad practice to have a "most_recent" boolean column in addition to a "create_at" timestamp column to track the latest version of a record?
Viewed 0 times
most_recentcreate_atthelatesttrackbooleancolumnpracticeversionaddition
Problem
The table looks like this, it's SCD type 2:
For 99% of queries we will be searching the entire table and filtering by additional columns and join tables. For these queries we're only interested in the most recent version of a record per unique ID. We will also be sorting by created_at and other columns.
To make it easy to find the most current records I was considering to add a
https://stackoverflow.com/questions/34495479/add-constraint-to-make-column-unique-per-group-of-rows/34495621#34495621
However I realized we already have the
https://stackoverflow.com/questions/17327043/how-can-i-select-rows-with-most-recent-timestamp-for-each-key-value
However, we'd then have to re-order the results by the column that we actually want to use to show the data.
It seems simpler in the long run to add the extra 'current' field, and like it would be more performant, but is it also bad practice?
+-----------+------------------+------------------------+
| id (text) | version (serial) | created_at (timestamp) |
+-----------+------------------+------------------------+For 99% of queries we will be searching the entire table and filtering by additional columns and join tables. For these queries we're only interested in the most recent version of a record per unique ID. We will also be sorting by created_at and other columns.
To make it easy to find the most current records I was considering to add a
most_recent (boolean) column as described in the answer here:https://stackoverflow.com/questions/34495479/add-constraint-to-make-column-unique-per-group-of-rows/34495621#34495621
However I realized we already have the
created_at column which tells us this information - we could use a DISTINCT clause in our search queries and order by created_date as described by @Svet's answer here:https://stackoverflow.com/questions/17327043/how-can-i-select-rows-with-most-recent-timestamp-for-each-key-value
However, we'd then have to re-order the results by the column that we actually want to use to show the data.
It seems simpler in the long run to add the extra 'current' field, and like it would be more performant, but is it also bad practice?
Solution
it's a great practice
Marking and unmarking a record as most recent is trivial. Having a simple bit field to point you to the rows you care about is far easier than going out to find the most recent date for each group. Especially as data gets larger and larger, you'll kick yourself for not having this.
You haven't tagged the RDBMS you're using here, but all of the most commonly used ones support filtered indexes of some flavor, which would allow you to keep only the most recent rows indexed and easily accessible.
In SQL Server, that would look something like:
Of course, you do need a way to guarantee that only one row will allowed to be active (most recent) per group. The safest way to accomplish that is with a unique index (sort of like the one above).
See also: What is the correct way to ensure unique entries in a temporal database design?
Marking and unmarking a record as most recent is trivial. Having a simple bit field to point you to the rows you care about is far easier than going out to find the most recent date for each group. Especially as data gets larger and larger, you'll kick yourself for not having this.
You haven't tagged the RDBMS you're using here, but all of the most commonly used ones support filtered indexes of some flavor, which would allow you to keep only the most recent rows indexed and easily accessible.
In SQL Server, that would look something like:
CREATE INDEX
whatever
ON dbo.some_table
(key columns)
INCLUDE
(is_most_recent)
WHERE
(is_most_recent = 1);Of course, you do need a way to guarantee that only one row will allowed to be active (most recent) per group. The safest way to accomplish that is with a unique index (sort of like the one above).
CREATE UNIQUE INDEX
uniquely
ON dbo.some_table
(id)
INCLUDE
(is_most_recent)
WHERE
(is_most_recent = 1);See also: What is the correct way to ensure unique entries in a temporal database design?
Code Snippets
CREATE INDEX
whatever
ON dbo.some_table
(key columns)
INCLUDE
(is_most_recent)
WHERE
(is_most_recent = 1);CREATE UNIQUE INDEX
uniquely
ON dbo.some_table
(id)
INCLUDE
(is_most_recent)
WHERE
(is_most_recent = 1);Context
StackExchange Database Administrators Q#334770, answer score: 11
Revisions (0)
No revisions yet.