patternpythonMajor
dbt materializations: view vs table vs incremental vs ephemeral
Viewed 0 times
dbt incremental modeldbt materialization typesdbt merge strategydbt is_incrementaldbt ephemeral
Problem
Choosing the wrong dbt materialization wastes compute (rebuilding huge tables on every run) or produces stale data (views recalculated on every query). Teams default to table for everything and wait 30 minutes for a full refresh.
Solution
Match materialization to usage:
-- Incremental model (append/merge new rows only)
{{ config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge'
) }}
SELECT
order_id,
customer_id,
amount,
created_at
FROM {{ ref('stg_orders') }}
{% if is_incremental() %}
WHERE created_at > (SELECT MAX(created_at) FROM {{ this }})
{% endif %}
-- Incremental model (append/merge new rows only)
{{ config(
materialized='incremental',
unique_key='order_id',
incremental_strategy='merge'
) }}
SELECT
order_id,
customer_id,
amount,
created_at
FROM {{ ref('stg_orders') }}
{% if is_incremental() %}
WHERE created_at > (SELECT MAX(created_at) FROM {{ this }})
{% endif %}
Why
Incremental models only process new/changed rows, reducing compute from O(total rows) to O(new rows). Views add no storage cost but re-execute on every downstream query. Ephemeral models are CTEs inlined at compile time — zero storage, zero query overhead at query time.
Gotchas
- Incremental models do not automatically handle late-arriving data or corrections — combine with a full-refresh schedule
- unique_key with merge strategy requires the warehouse to support MERGE (Snowflake, BigQuery, Redshift) — check dialect support
- Ephemeral models cannot be queried directly and cannot be ref()ed from other dbt projects
- dbt build --full-refresh forces a complete table rebuild for incremental models
Code Snippets
Incremental model with delete+insert and date partitioning for Spark
-- dbt incremental with delete+insert strategy for Spark/Databricks
{{ config(
materialized='incremental',
unique_key='event_id',
incremental_strategy='delete+insert',
partition_by={'field': 'event_date', 'data_type': 'date'}
) }}
SELECT event_id, user_id, event_date, payload
FROM {{ ref('stg_events') }}
{% if is_incremental() %}
WHERE event_date >= DATE_SUB(CURRENT_DATE(), 3)
{% endif %}Context
Optimizing dbt run times and choosing how to persist transformed data
Revisions (0)
No revisions yet.