patternpythonModerate
dbt tests: generic and singular tests for data quality gates
Viewed 0 times
dbt test genericdbt schema testdbt singular testdbt data qualitydbt not_null unique
Problem
dbt transformations run without validating outputs, allowing nulls, duplicates, and invalid values to propagate silently into downstream dashboards and ML features.
Solution
Add generic tests in schema.yml and custom singular tests for complex assertions:
# models/marts/schema.yml
models:
- name: fct_orders
columns:
- name: order_id
tests:
- not_null
- unique
- name: status
tests:
- accepted_values:
values: ['pending', 'completed', 'cancelled']
- name: customer_id
tests:
- relationships:
to: ref('dim_customers')
field: customer_id
-- tests/assert_positive_amounts.sql (singular test — fails if rows returned)
SELECT order_id, amount
FROM {{ ref('fct_orders') }}
WHERE amount <= 0
# models/marts/schema.yml
models:
- name: fct_orders
columns:
- name: order_id
tests:
- not_null
- unique
- name: status
tests:
- accepted_values:
values: ['pending', 'completed', 'cancelled']
- name: customer_id
tests:
- relationships:
to: ref('dim_customers')
field: customer_id
-- tests/assert_positive_amounts.sql (singular test — fails if rows returned)
SELECT order_id, amount
FROM {{ ref('fct_orders') }}
WHERE amount <= 0
Why
dbt tests run as SELECT queries that return rows when assertions fail. Integrating tests into CI/CD (dbt test after dbt run) creates a data quality gate that blocks bad data from reaching production models.
Gotchas
- Generic tests run against every row by default — add where: config to scope large tables
- dbt test --select fct_orders runs only tests for that model, not upstream
- Relationships tests are expensive on large tables — consider sampling or moving to Great Expectations for row-level validation
- severity: warn vs error controls whether test failure fails the pipeline or just logs a warning
Code Snippets
dbt test with severity and where clause to scope validation
# Scoped test with severity and where clause
models:
- name: fct_orders
columns:
- name: amount
tests:
- not_null:
severity: error
- dbt_utils.accepted_range:
min_value: 0
max_value: 100000
config:
where: "created_at >= '2023-01-01'"Context
Adding data quality checks to a dbt project before deploying to production
Revisions (0)
No revisions yet.