HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Design of portable tables with validity interval (historization, temporal databases)

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
temporaltableshistorizationdatabaseswithdesignintervalvalidityportable

Problem

I'm designing a data model for an application which must keep track about the changes of data.

In a first step, my application must support PostgreSQL, but I'd like to add support for other RDBMS (especially Oracle and MS SQL server) in a second step. Therefore, I'd like to choose a portable data model with less usage of proprietary features. (The DDL for the tables may be different from RDBMS vendor to RDBMS vendor. But the SQL queries / statements in the application should be the same for all supported vendors, as far as possible.)

For example, let's say there is a users and a users_versions table. users_versions has a foreign key on users.

An example of the tables could look like:

users
----------------
id | username
---------------- 
 1 | johndoe
 2 | sally

users_versions --> references id of user (userid)
---------------------------------------------------------------------------
id | userid | name     | street      | place     | validfrom  | validuntil
---------------------------------------------------------------------------
 1 |      1 | John Doe | 2nd Fake St | Faketown  | 2018-01-04 | 2018-01-05
 2 |      1 | John Doe | Real St 23  | Faketown  | 2018-01-05 | null
 3 |      2 | Sally Wu | Main St 1   | Lake Fake | 2018-04-02 | 2018-04-20
 4 |      2 | Sally Wu | Other St 99 | Chicago   | 2018-04-20 | null


Most SQL queries will query for the entries currently valid. In the concept example above this woule look like

SELECT *
  FROM users_versions uv 
  INNER JOIN users u ON u.id = uv.userid
  WHERE uv.userid = 123 AND uv.validuntil IS NULL;


Some use cases (reporting etc.) will require SELECTing a historic version of data, as well (e.g. what data were valid at 2017-12-31?). But these won't be performance critical in my application.

In the example above, I might create a filtered unique index on validuntil to ensure that there is only 1 entry with unlimited validity at a time:

```
CREATE UNIQUE INDEX foo
ON users_versio

Solution

I must say that I agree with the spirit of other answers, and I think that you should first focus on building an optimal database with one specific database management system (DBMS) in mind; the portability aspect, although important, should be secondary.

According to the content of your question, you appear to be very familiar with the subject. Anyway, I have shared my take on two scenarios involving temporal capabilities in this post and also in this other post (containing sample diagrams, expository DDL code, etc.), in case you want to take a look and establish some analogies.

Conceptual examination

Starting the analysis at the conceptual level, the business rules under consideration can be formulated as follows:

  • There can be one-to-many Users



  • A User holds exactly-one CurrentVersion



  • A User holds zero-or-many PastVersions



As demonstrated, the entity types CurrentVersion and PastVersion are involved in a one-to-zero-or-many (or zero-or-many-to-one) association. Apart from the cardinalities, it can be inferred that we are dealing with two distinct entity types because, in this case, a CurrentVersion instance does not have a ValidUntil property, while all instances of PastVersion must have it.

Logical-level arrangement

So, I suggest (a) one base table for the “current version” rows and (b) one base table for the “past version” ones. In this way, the assertions (i.e., rows) retained in each table represent what is a clearly different —although associated— kind of fact (as per the relational model theory), avoiding the ad hoc introduction of ambiguities in a single table.

Considering the user example you brought up —and in agreement with the conceptual definitions above—, the structure of the two tables would be almost the same, but user_version (i.e., the one for the “past” versions) includes an additional valid_until column, which along with user_id must make up the composite PRIMARY KEY of said table. The user_version.user_id column must be constrained as a FOREIGN KEY referencing user.user_id.

Manipulation

When a most “up-to-date” version has to be “saved“, the whole row of the “previous” version undergoes an INSERT operation INTO the user_version table, attaching the corresponding valid_until value indicating the exact instant when the operation is carried out. In turn, the values of the “preceding“ row at the user (i.e., “current“) table are replaced with the “most recent“ ones, by means of an UPDATE.

Each row in the user table would cover the need for unlimited validity that you have to ensure (not having a valid_until column, the values remain valid up to the moment when they are UPDATEd, which may never arrive).

Integrity

Of course, the sequentiality of the associated values has to be taken care of (e.g., preventing overlaps, rejecting invalid dates, etc.), just like overall integrity. I would make use of ACID Transactions to guarantee that the pertinent operations are treated as a single Unit of Work within the DBMS itself. Stored procedures (or functions in Postgres) with appropriate permissions would as well be very helpful.

There is no need for NULLable columns —a table with NULL marks does not portray a mathematical relation, so one cannot expect that it behaves as such, it can be normalized, etc.—, nor for a valid column managed by one (or more) application program(s) —which would endanger data quality by violating the principle of self-protection of a database—.

Derivability

The period comprehended between the values in user_version.valid_from and user_version.valid_until stands for the entire validity_interval during which a certain “past” row was “current” or “effective” (it can be calculated in days, minutes, etc. and may be incorporated into a view or computed in application program [app] code as convenient). This and other relevant aspects imply deriving data by virtue of data manipulation operations, mostly SELECTs and a few subqueries.

Accessing the database from the external level by way of one or more apps

Constructing, let us say, an object-oriented programming “intermediate tier” consumed, in turn, by a “higher tier” of one or more apps (or another kind of software component) would as well help in database portability, allowing code reuse and considerable isolation from migrations to other DBMSs. This resource about the Repository Pattern in .NET (C#) can bring about some ideas in this respect.

Portability considerations

It is important to draw a distinction between two of the different levels of abstraction of a database built on a SQL DBMS. The (1) structure and (2) constraints of the tables along with (3) the data manipulation operations —INSERT, SELECT, UPDATE, DELETE, combinations thereof— effectuated on the tables are elements of the logical level. The (4) underlying indexes supporting a table and/or constraints are (“lower“) physical-level components.

In this manner, the same logical design principle is applicabl

Context

StackExchange Database Administrators Q#205407, answer score: 4

Revisions (0)

No revisions yet.