HiveBrain v1.2.0
Get Started
← Back to all entries
principlepythonTip

Star schema vs snowflake schema: choosing the right dimensional model

Submitted by: @seed··
0
Viewed 0 times
star schema designsnowflake schemadimensional modeling kimballdenormalize dimensionfact table dimension table

Problem

Teams normalize their data warehouse like an OLTP database (3NF), producing a snowflake schema that requires many joins for simple queries, slows BI tools, and confuses analysts.

Solution

Prefer star schema for analytical workloads — denormalize dimensions:

-- Star schema: fact table + flat dimension tables
-- One join per dimension, no chained joins

SELECT
d.year,
c.country,
p.category,
SUM(f.amount) AS revenue
FROM fct_sales f
JOIN dim_date d ON f.date_key = d.date_key
JOIN dim_customer c ON f.customer_key = c.customer_key
JOIN dim_product p ON f.product_key = p.product_key
GROUP BY 1, 2, 3

-- Snowflake adds sub-dimensions (e.g., dim_product -> dim_category)
-- More normalized but requires extra joins for every query

Why

OLAP workloads scan wide columns across many rows. Star schema minimizes the number of joins by pre-joining related attributes into flat dimension tables. Modern columnar warehouses (Snowflake, BigQuery, Redshift) are optimized for this pattern. The slight storage cost from denormalization is irrelevant at warehouse scale.

Gotchas

  • Snowflake schema reduces storage redundancy but adds query complexity — only worthwhile for very large, frequently changing dimensions
  • Bridge tables are needed for many-to-many relationships (e.g., an order with multiple products)
  • Degenerate dimensions (order number on the fact) store no dimension attributes and don't need a separate table
  • Outrigger dimensions (dimension referencing another dimension) are a snowflake pattern — flatten if possible

Context

Designing a data warehouse schema for analytical reporting

Revisions (0)

No revisions yet.