patternsqlMinor
Design of portable tables with validity interval (historization, temporal databases)
Viewed 0 times
temporaltableshistorizationdatabaseswithdesignintervalvalidityportable
Problem
I'm designing a data model for an application which must keep track about the changes of data.
In a first step, my application must support PostgreSQL, but I'd like to add support for other RDBMS (especially Oracle and MS SQL server) in a second step. Therefore, I'd like to choose a portable data model with less usage of proprietary features. (The DDL for the tables may be different from RDBMS vendor to RDBMS vendor. But the SQL queries / statements in the application should be the same for all supported vendors, as far as possible.)
For example, let's say there is a
An example of the tables could look like:
Most SQL queries will query for the entries currently valid. In the concept example above this woule look like
Some use cases (reporting etc.) will require SELECTing a historic version of data, as well (e.g. what data were valid at
In the example above, I might create a filtered unique index on
```
CREATE UNIQUE INDEX foo
ON users_versio
In a first step, my application must support PostgreSQL, but I'd like to add support for other RDBMS (especially Oracle and MS SQL server) in a second step. Therefore, I'd like to choose a portable data model with less usage of proprietary features. (The DDL for the tables may be different from RDBMS vendor to RDBMS vendor. But the SQL queries / statements in the application should be the same for all supported vendors, as far as possible.)
For example, let's say there is a
users and a users_versions table. users_versions has a foreign key on users.An example of the tables could look like:
users
----------------
id | username
----------------
1 | johndoe
2 | sally
users_versions --> references id of user (userid)
---------------------------------------------------------------------------
id | userid | name | street | place | validfrom | validuntil
---------------------------------------------------------------------------
1 | 1 | John Doe | 2nd Fake St | Faketown | 2018-01-04 | 2018-01-05
2 | 1 | John Doe | Real St 23 | Faketown | 2018-01-05 | null
3 | 2 | Sally Wu | Main St 1 | Lake Fake | 2018-04-02 | 2018-04-20
4 | 2 | Sally Wu | Other St 99 | Chicago | 2018-04-20 | nullMost SQL queries will query for the entries currently valid. In the concept example above this woule look like
SELECT *
FROM users_versions uv
INNER JOIN users u ON u.id = uv.userid
WHERE uv.userid = 123 AND uv.validuntil IS NULL;Some use cases (reporting etc.) will require SELECTing a historic version of data, as well (e.g. what data were valid at
2017-12-31?). But these won't be performance critical in my application.In the example above, I might create a filtered unique index on
validuntil to ensure that there is only 1 entry with unlimited validity at a time:```
CREATE UNIQUE INDEX foo
ON users_versio
Solution
I must say that I agree with the spirit of other answers, and I think that you should first focus on building an optimal database with one specific database management system (DBMS) in mind; the portability aspect, although important, should be secondary.
According to the content of your question, you appear to be very familiar with the subject. Anyway, I have shared my take on two scenarios involving temporal capabilities in this post and also in this other post (containing sample diagrams, expository DDL code, etc.), in case you want to take a look and establish some analogies.
Conceptual examination
Starting the analysis at the conceptual level, the business rules under consideration can be formulated as follows:
As demonstrated, the entity types CurrentVersion and PastVersion are involved in a one-to-zero-or-many (or zero-or-many-to-one) association. Apart from the cardinalities, it can be inferred that we are dealing with two distinct entity types because, in this case, a CurrentVersion instance does not have a ValidUntil property, while all instances of PastVersion must have it.
Logical-level arrangement
So, I suggest (a) one base table for the “current version” rows and (b) one base table for the “past version” ones. In this way, the assertions (i.e., rows) retained in each table represent what is a clearly different —although associated— kind of fact (as per the relational model theory), avoiding the ad hoc introduction of ambiguities in a single table.
Considering the
Manipulation
When a most “up-to-date” version has to be “saved“, the whole row of the “previous” version undergoes an INSERT operation INTO the
Each row in the
Integrity
Of course, the sequentiality of the associated values has to be taken care of (e.g., preventing overlaps, rejecting invalid dates, etc.), just like overall integrity. I would make use of ACID Transactions to guarantee that the pertinent operations are treated as a single Unit of Work within the DBMS itself. Stored procedures (or functions in Postgres) with appropriate permissions would as well be very helpful.
There is no need for NULLable columns —a table with NULL marks does not portray a mathematical relation, so one cannot expect that it behaves as such, it can be normalized, etc.—, nor for a
Derivability
The period comprehended between the values in
Accessing the database from the external level by way of one or more apps
Constructing, let us say, an object-oriented programming “intermediate tier” consumed, in turn, by a “higher tier” of one or more apps (or another kind of software component) would as well help in database portability, allowing code reuse and considerable isolation from migrations to other DBMSs. This resource about the Repository Pattern in .NET (C#) can bring about some ideas in this respect.
Portability considerations
It is important to draw a distinction between two of the different levels of abstraction of a database built on a SQL DBMS. The (1) structure and (2) constraints of the tables along with (3) the data manipulation operations —INSERT, SELECT, UPDATE, DELETE, combinations thereof— effectuated on the tables are elements of the logical level. The (4) underlying indexes supporting a table and/or constraints are (“lower“) physical-level components.
In this manner, the same logical design principle is applicabl
According to the content of your question, you appear to be very familiar with the subject. Anyway, I have shared my take on two scenarios involving temporal capabilities in this post and also in this other post (containing sample diagrams, expository DDL code, etc.), in case you want to take a look and establish some analogies.
Conceptual examination
Starting the analysis at the conceptual level, the business rules under consideration can be formulated as follows:
- There can be one-to-many Users
- A User holds exactly-one CurrentVersion
- A User holds zero-or-many PastVersions
As demonstrated, the entity types CurrentVersion and PastVersion are involved in a one-to-zero-or-many (or zero-or-many-to-one) association. Apart from the cardinalities, it can be inferred that we are dealing with two distinct entity types because, in this case, a CurrentVersion instance does not have a ValidUntil property, while all instances of PastVersion must have it.
Logical-level arrangement
So, I suggest (a) one base table for the “current version” rows and (b) one base table for the “past version” ones. In this way, the assertions (i.e., rows) retained in each table represent what is a clearly different —although associated— kind of fact (as per the relational model theory), avoiding the ad hoc introduction of ambiguities in a single table.
Considering the
user example you brought up —and in agreement with the conceptual definitions above—, the structure of the two tables would be almost the same, but user_version (i.e., the one for the “past” versions) includes an additional valid_until column, which along with user_id must make up the composite PRIMARY KEY of said table. The user_version.user_id column must be constrained as a FOREIGN KEY referencing user.user_id. Manipulation
When a most “up-to-date” version has to be “saved“, the whole row of the “previous” version undergoes an INSERT operation INTO the
user_version table, attaching the corresponding valid_until value indicating the exact instant when the operation is carried out. In turn, the values of the “preceding“ row at the user (i.e., “current“) table are replaced with the “most recent“ ones, by means of an UPDATE. Each row in the
user table would cover the need for unlimited validity that you have to ensure (not having a valid_until column, the values remain valid up to the moment when they are UPDATEd, which may never arrive).Integrity
Of course, the sequentiality of the associated values has to be taken care of (e.g., preventing overlaps, rejecting invalid dates, etc.), just like overall integrity. I would make use of ACID Transactions to guarantee that the pertinent operations are treated as a single Unit of Work within the DBMS itself. Stored procedures (or functions in Postgres) with appropriate permissions would as well be very helpful.
There is no need for NULLable columns —a table with NULL marks does not portray a mathematical relation, so one cannot expect that it behaves as such, it can be normalized, etc.—, nor for a
valid column managed by one (or more) application program(s) —which would endanger data quality by violating the principle of self-protection of a database—. Derivability
The period comprehended between the values in
user_version.valid_from and user_version.valid_until stands for the entire validity_interval during which a certain “past” row was “current” or “effective” (it can be calculated in days, minutes, etc. and may be incorporated into a view or computed in application program [app] code as convenient). This and other relevant aspects imply deriving data by virtue of data manipulation operations, mostly SELECTs and a few subqueries.Accessing the database from the external level by way of one or more apps
Constructing, let us say, an object-oriented programming “intermediate tier” consumed, in turn, by a “higher tier” of one or more apps (or another kind of software component) would as well help in database portability, allowing code reuse and considerable isolation from migrations to other DBMSs. This resource about the Repository Pattern in .NET (C#) can bring about some ideas in this respect.
Portability considerations
It is important to draw a distinction between two of the different levels of abstraction of a database built on a SQL DBMS. The (1) structure and (2) constraints of the tables along with (3) the data manipulation operations —INSERT, SELECT, UPDATE, DELETE, combinations thereof— effectuated on the tables are elements of the logical level. The (4) underlying indexes supporting a table and/or constraints are (“lower“) physical-level components.
In this manner, the same logical design principle is applicabl
Context
StackExchange Database Administrators Q#205407, answer score: 4
Revisions (0)
No revisions yet.