patternpythonModerate
Slowly Changing Dimensions Type 2: tracking history in dimensional models
Viewed 0 times
slowly changing dimension type 2scd2 dbt snapshotsurrogate key dimensionhistorical dimension trackingscd pattern
Problem
A customer changes their country. The fact table records history but the dimension table gets updated in place, making it impossible to answer 'what country did this customer belong to when they made this purchase?'
Solution
Implement SCD Type 2 with effective date range and current flag:
-- dim_customers SCD2 schema
CREATE TABLE dim_customers (
customer_sk BIGINT PRIMARY KEY, -- surrogate key
customer_id INT NOT NULL, -- natural key
name VARCHAR,
country VARCHAR,
valid_from DATE NOT NULL,
valid_to DATE, -- NULL = current record
is_current BOOLEAN DEFAULT TRUE
);
-- dbt SCD2 via snapshot
{% snapshot customers_snapshot %}
{{ config(
target_schema='snapshots',
unique_key='customer_id',
strategy='check',
check_cols=['country', 'name'],
invalidate_hard_deletes=True,
) }}
SELECT * FROM {{ source('crm', 'customers') }}
{% endsnapshot %}
-- dim_customers SCD2 schema
CREATE TABLE dim_customers (
customer_sk BIGINT PRIMARY KEY, -- surrogate key
customer_id INT NOT NULL, -- natural key
name VARCHAR,
country VARCHAR,
valid_from DATE NOT NULL,
valid_to DATE, -- NULL = current record
is_current BOOLEAN DEFAULT TRUE
);
-- dbt SCD2 via snapshot
{% snapshot customers_snapshot %}
{{ config(
target_schema='snapshots',
unique_key='customer_id',
strategy='check',
check_cols=['country', 'name'],
invalidate_hard_deletes=True,
) }}
SELECT * FROM {{ source('crm', 'customers') }}
{% endsnapshot %}
Why
Fact tables should store the surrogate key (customer_sk) from the dimension, not the natural key. When the dimension record is updated, a new row is inserted with a new surrogate key. Historical facts retain their original surrogate key, preserving the state at transaction time.
Gotchas
- Always join facts to dimensions on both surrogate key AND date range (between valid_from and valid_to) to get the correct historical version
- SCD2 tables grow indefinitely — archive or expire old rows beyond a retention window
- dbt snapshots use DBT_UPDATED_AT and DBT_VALID_TO columns automatically
- SCD Type 1 (overwrite) is correct for fixing data errors; SCD Type 2 is correct for auditable history
Context
Building or maintaining dimensional data models in a data warehouse
Revisions (0)
No revisions yet.