HiveBrain v1.2.0
Get Started
← Back to all entries
snippetMinor

Database design: how to handle the "archive" problem?

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
problemthehandledesigndatabasehowarchive

Problem

I'm pretty sure a lot of applications, critical applications, banks and so on do this on a daily basis.

The idea behind all that is:

  • all the rows must have a history



  • all links must stay coherent



  • it should be easy to make requests to get "current" columns



  • clients who have bought obsolete things should still see what they've bought even though this product is not part of the catalogue anymore



and so on.

Here's what I want to do, and I'll explain the problems I'm facing.

All my tables will have those columns:

  • id



  • id_origin



  • date of creation



  • start date of validity



  • start end of validity



And here are the ideas for CRUD operations:

  • create = insert new row with id_origin = id, date of creation=now, start date of validity=now, end date of validity=null (= means it's the current active record)



  • update =



  • read = read all the records with end date of validity==null



  • update the "current" record end date of validity=null with end date of validity=now



  • create a new one with the new values, and end date of validity=null (= means it's the current active record)



  • delete = update the "current" record end date of validity=null with end date of validity=now



So here's my problem: with many-to-many associations. Let's take an example with values:

  • Table A (id = 1, id_origin = 1, start=now, end=null)



  • Table A_B (start=now, end=null, id_A = 1, id_B = 48)



  • Table B (id = 48, id_origin = 48, start=now, end=null)



Now I want to update table A, record id=1

  • I mark record id=1 with end=now



-
I insert a new value into table A and... damn I've lost my relation A_B unless I duplicate the relation, too... this would end to a table:

-
Table A (id = 1, id_origin = 1, start=now, end=now+8mn)

  • Table A (id = 2, id_origin = 1, start=now+8mn, end=null)



  • Table A_B (start=now, end=null, id_A = 1, id_B = 48)



  • Table A_B (start=now, end=null, id_A = 2, id_B = 48)



  • Table B (id = 48, id_origin = 48, start=now, end=null)

Solution

It's not clear to me if these requirements are for auditing purposes or just simple historical reference such as with CRM and shopping carts.

Either way, consider have an main and main_archive table for each major area where this is required. "Main" will only have current / active entries whereas "main_archive" will have a copy of everything that ever goes into main. Insert / update into main_archive can be a trigger from insert / update into main. Deletes against main_archive can then run across a longer period of time, if ever.

For the referential issues such as Cust X bought Product Y, the easiest way to solve your referential concern of cust_archive -> product_archive is to never delete entries from product_archive. Generally, churn should be much lower in that table so size shouldn't be too bad of a concern.

HTH.

Context

StackExchange Database Administrators Q#17309, answer score: 4

Revisions (0)

No revisions yet.