patternsqlMinor

What to do with duplicate lookup information

Submitted by: @import:stackexchange-dba·Mar 10, 2026·

Viewed 0 times

whatwithduplicatelookupinformation

Problem

I have multiple databases that I want to store in one data warehouse database. I am wondering how I design the import process to handle multiple lookup tables.

For example, say I have 5 databases all with the lookup table CustomerState. In one datatabse it could look like this:

In another database it could look like this:

How should I handle this in my enterprise layer of my DW database? Do I add a SourceSystemId to the lookup table, maybe something like this:

And then use the pkyCustomerStateId in my Customer table rather than the CustomerStateId?

Solution

This type of thing should be handled by the ETL process that brings the data into the data warehouse. In fact, this process is the T in ETL.

What you need to do first is define the logical key column(s) of the tables, so the business meaning of the rows can be equated between the databases. A multi-column key as you propose would complicate matters, and really doesn't solve the problem.

For this example, I would define CustomerState as the logical key column in the dimension, and when the separate tables are merged together, this column would be unique in the result, with new CustomerStateId values assigned. This ensures the dimension primary key is as narrow as possible, which will carry through to the fact tables and make them as narrow as possible as well.

The ETL process might do something like this (assuming the CustomerStateId column of the target table is an IDENTITY column):

MERGE INTO [dbo].[CustomerState] tgt
    USING [Staging].[CustomerState] src ON src.CustomerState = tgt.CustomerState
    WHEN NOT MATCHED BY TARGET THEN
        INSERT (CustomerState) VALUES (src.CustomerState);

(The reason I used MERGE instead of INSERT is that in other dimensions you may need to handle doing updates as well; not in this case as there are no other columns.)

Then, the fact table loading process would use a lookup mechanism (Lookup Transformation in SSIS) to go from the CustomerState logical value to the newly-assigned CustomerStateid value generated by the above statement.

Code Snippets

MERGE INTO [dbo].[CustomerState] tgt
    USING [Staging].[CustomerState] src ON src.CustomerState = tgt.CustomerState
    WHEN NOT MATCHED BY TARGET THEN
        INSERT (CustomerState) VALUES (src.CustomerState);

Context

StackExchange Database Administrators Q#34816, answer score: 6

Revisions (0)

No revisions yet.