patternsqlMinor
Is there a non-DDL way to preserve back-references when staging data into production?
Viewed 0 times
nonpreserveintoproductionwaydataddlstagingbackwhen
Problem
Let's say I've got this hypothetical schema:
Source (OLTP) DB:
Destination (DSS) DB:
Obviously the real deal is considerably more complicated with more transformations. But assume for the moment that all this ETL does is merge specific transactions ("orders") from an external entity into a DSS that tracks generic "activities". The links between the external customer and the DSS person are in the CustomerMapping table.
The idea of the "Import" table is to provide some sort of audit trail in case something goes wrong. We don't have a lot of control over the source system and know it to be a bit on the flaky side. So it's really important for us to be able to understand the origin of any given activity.
Right now, there's a script that does this with DDL, which looks something like this:
This works fine, in the sense of not crashing and burning, it just makes me nauseous every time I look at the DDL.
-
Permanently adding an
-
Removing the `IDENTI
Source (OLTP) DB:
Table Orders
------------
OrderID int IDENTITY (PK),
CustomerID int NOT NULL,
OrderAmount decimal NOT NULLDestination (DSS) DB:
Table Activity
--------------
ActivityID int IDENTITY (PK),
PersonID int NOT NULL,
Amount decimal NOT NULL
Table ActivityOrderImport
--------------------
ActivityID int NOT NULL,
SourceOrderID int NOT NULL
Table CustomerMapping
---------------------
CustomerID int NOT NULL,
PersonID int NOT NULLObviously the real deal is considerably more complicated with more transformations. But assume for the moment that all this ETL does is merge specific transactions ("orders") from an external entity into a DSS that tracks generic "activities". The links between the external customer and the DSS person are in the CustomerMapping table.
The idea of the "Import" table is to provide some sort of audit trail in case something goes wrong. We don't have a lot of control over the source system and know it to be a bit on the flaky side. So it's really important for us to be able to understand the origin of any given activity.
Right now, there's a script that does this with DDL, which looks something like this:
ALTER TABLE Activity
ADD OrderID int NULL
MERGE Activity
USING #StagingOrders
(...)
INSERT ActivityOrderImport (ActivityID, SourceOrderID)
SELECT a.ActivityID, s.OrderID
FROM #StagingOrders s
INNER JOIN Activity a
ON a.OrderID = s.OrderID
ALTER TABLE Activity
DROP COLUMN OrderIDThis works fine, in the sense of not crashing and burning, it just makes me nauseous every time I look at the DDL.
-
Permanently adding an
OrderID column to the Activity table really isn't an option because the data may come from multiple sources, each currently needing their own log table. Adding a separate column to the main table for all of these would break normalization and quickly turn the production database into a dog's breakfast.-
Removing the `IDENTI
Solution
I would suggest adding a (set of) staging tables to the destination or an intermediary that can be better controlled and more stable. Place the tracking info there. Then do all the transformation from the staging to the final destination, either carrying the tracking info with it or discarding it.
There are several ways you can generate a tracking key from multiple systems, as long as they all follow the same algorithm, it doesn't have to be an INT. It could be a 2 char prefix with a sequential number. For that matter it doesn't have to be just one column.
There are several ways you can generate a tracking key from multiple systems, as long as they all follow the same algorithm, it doesn't have to be an INT. It could be a 2 char prefix with a sequential number. For that matter it doesn't have to be just one column.
Context
StackExchange Database Administrators Q#1395, answer score: 4
Revisions (0)
No revisions yet.