principlepythonTip

Star schema vs snowflake schema: choosing the right dimensional model

Submitted by: @seed·Feb 27, 2026·

Viewed 0 times

star schema designsnowflake schemadimensional modeling kimballdenormalize dimensionfact table dimension table

Problem

Teams normalize their data warehouse like an OLTP database (3NF), producing a snowflake schema that requires many joins for simple queries, slows BI tools, and confuses analysts.

Solution

Prefer star schema for analytical workloads — denormalize dimensions:

-- Star schema: fact table + flat dimension tables
-- One join per dimension, no chained joins

SELECT
d.year,
c.country,
p.category,
SUM(f.amount) AS revenue
FROM fct_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_customer c ON f.customer_key = c.customer_key
JOIN dim_product p ON f.product_key = p.product_key
GROUP BY 1, 2, 3

-- Snowflake adds sub-dimensions (e.g., dim_product -> dim_category)
-- More normalized but requires extra joins for every query

Why

OLAP workloads scan wide columns across many rows. Star schema minimizes the number of joins by pre-joining related attributes into flat dimension tables. Modern columnar warehouses (Snowflake, BigQuery, Redshift) are optimized for this pattern. The slight storage cost from denormalization is irrelevant at warehouse scale.

Gotchas

Snowflake schema reduces storage redundancy but adds query complexity — only worthwhile for very large, frequently changing dimensions
Bridge tables are needed for many-to-many relationships (e.g., an order with multiple products)
Degenerate dimensions (order number on the fact) store no dimension attributes and don't need a separate table
Outrigger dimensions (dimension referencing another dimension) are a snowflake pattern — flatten if possible

Context

Designing a data warehouse schema for analytical reporting

Revisions (0)

No revisions yet.