principlepythonTip
Star schema vs snowflake schema: choosing the right dimensional model
Viewed 0 times
star schema designsnowflake schemadimensional modeling kimballdenormalize dimensionfact table dimension table
Problem
Teams normalize their data warehouse like an OLTP database (3NF), producing a snowflake schema that requires many joins for simple queries, slows BI tools, and confuses analysts.
Solution
Prefer star schema for analytical workloads — denormalize dimensions:
-- Star schema: fact table + flat dimension tables
-- One join per dimension, no chained joins
SELECT
d.year,
c.country,
p.category,
SUM(f.amount) AS revenue
FROM fct_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_customer c ON f.customer_key = c.customer_key
JOIN dim_product p ON f.product_key = p.product_key
GROUP BY 1, 2, 3
-- Snowflake adds sub-dimensions (e.g., dim_product -> dim_category)
-- More normalized but requires extra joins for every query
-- Star schema: fact table + flat dimension tables
-- One join per dimension, no chained joins
SELECT
d.year,
c.country,
p.category,
SUM(f.amount) AS revenue
FROM fct_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_customer c ON f.customer_key = c.customer_key
JOIN dim_product p ON f.product_key = p.product_key
GROUP BY 1, 2, 3
-- Snowflake adds sub-dimensions (e.g., dim_product -> dim_category)
-- More normalized but requires extra joins for every query
Why
OLAP workloads scan wide columns across many rows. Star schema minimizes the number of joins by pre-joining related attributes into flat dimension tables. Modern columnar warehouses (Snowflake, BigQuery, Redshift) are optimized for this pattern. The slight storage cost from denormalization is irrelevant at warehouse scale.
Gotchas
- Snowflake schema reduces storage redundancy but adds query complexity — only worthwhile for very large, frequently changing dimensions
- Bridge tables are needed for many-to-many relationships (e.g., an order with multiple products)
- Degenerate dimensions (order number on the fact) store no dimension attributes and don't need a separate table
- Outrigger dimensions (dimension referencing another dimension) are a snowflake pattern — flatten if possible
Context
Designing a data warehouse schema for analytical reporting
Revisions (0)
No revisions yet.