HiveBrain v1.2.0
Get Started
← Back to all entries
patternModerate

Is it bad practice to have a "most_recent" boolean column in addition to a "create_at" timestamp column to track the latest version of a record?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
most_recentcreate_atthelatesttrackbooleancolumnpracticeversionaddition

Problem

The table looks like this, it's SCD type 2:

+-----------+------------------+------------------------+
| id (text) | version (serial) | created_at (timestamp) |
+-----------+------------------+------------------------+


For 99% of queries we will be searching the entire table and filtering by additional columns and join tables. For these queries we're only interested in the most recent version of a record per unique ID. We will also be sorting by created_at and other columns.

To make it easy to find the most current records I was considering to add a most_recent (boolean) column as described in the answer here:

https://stackoverflow.com/questions/34495479/add-constraint-to-make-column-unique-per-group-of-rows/34495621#34495621

However I realized we already have the created_at column which tells us this information - we could use a DISTINCT clause in our search queries and order by created_date as described by @Svet's answer here:

https://stackoverflow.com/questions/17327043/how-can-i-select-rows-with-most-recent-timestamp-for-each-key-value

However, we'd then have to re-order the results by the column that we actually want to use to show the data.

It seems simpler in the long run to add the extra 'current' field, and like it would be more performant, but is it also bad practice?

Solution

it's a great practice

Marking and unmarking a record as most recent is trivial. Having a simple bit field to point you to the rows you care about is far easier than going out to find the most recent date for each group. Especially as data gets larger and larger, you'll kick yourself for not having this.

You haven't tagged the RDBMS you're using here, but all of the most commonly used ones support filtered indexes of some flavor, which would allow you to keep only the most recent rows indexed and easily accessible.

In SQL Server, that would look something like:

CREATE INDEX 
    whatever
ON dbo.some_table
    (key columns)
INCLUDE
    (is_most_recent)
WHERE
    (is_most_recent = 1);


Of course, you do need a way to guarantee that only one row will allowed to be active (most recent) per group. The safest way to accomplish that is with a unique index (sort of like the one above).

CREATE UNIQUE INDEX 
    uniquely
ON dbo.some_table
    (id)
INCLUDE
    (is_most_recent)
WHERE
    (is_most_recent = 1);


See also: What is the correct way to ensure unique entries in a temporal database design?

Code Snippets

CREATE INDEX 
    whatever
ON dbo.some_table
    (key columns)
INCLUDE
    (is_most_recent)
WHERE
    (is_most_recent = 1);
CREATE UNIQUE INDEX 
    uniquely
ON dbo.some_table
    (id)
INCLUDE
    (is_most_recent)
WHERE
    (is_most_recent = 1);

Context

StackExchange Database Administrators Q#334770, answer score: 11

Revisions (0)

No revisions yet.