patternsqlpostgresqlTip
Window functions: running totals, rankings, and lag/lead without self-joins
Viewed 0 times
window functionOVER PARTITION BYranklagleadrunning totalmoving averageROWS BETWEEN
Problem
Computing running totals, row rankings, or comparing a row to the previous/next row requires self-joins or subqueries that are verbose, slow, and hard to maintain.
Solution
Use window functions with OVER (PARTITION BY ... ORDER BY ...):
-- Running total:
SELECT id, amount,
sum(amount) OVER (ORDER BY created_at) AS running_total
FROM orders;
-- Rank within category:
SELECT id, category, sales,
rank() OVER (PARTITION BY category ORDER BY sales DESC) AS rank_in_cat
FROM products;
-- Compare to previous row:
SELECT date, revenue,
lag(revenue, 1) OVER (ORDER BY date) AS prev_revenue,
revenue - lag(revenue, 1) OVER (ORDER BY date) AS delta
FROM daily_revenue;
-- Moving 7-day average:
SELECT date, revenue,
avg(revenue) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d
FROM daily_revenue;
-- Running total:
SELECT id, amount,
sum(amount) OVER (ORDER BY created_at) AS running_total
FROM orders;
-- Rank within category:
SELECT id, category, sales,
rank() OVER (PARTITION BY category ORDER BY sales DESC) AS rank_in_cat
FROM products;
-- Compare to previous row:
SELECT date, revenue,
lag(revenue, 1) OVER (ORDER BY date) AS prev_revenue,
revenue - lag(revenue, 1) OVER (ORDER BY date) AS delta
FROM daily_revenue;
-- Moving 7-day average:
SELECT date, revenue,
avg(revenue) OVER (
ORDER BY date
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
) AS moving_avg_7d
FROM daily_revenue;
Why
Window functions operate on a set of rows related to the current row (the window frame) without collapsing them into a single output row. They are evaluated after WHERE and GROUP BY, making them composable.
Gotchas
- Window functions cannot be used in WHERE or HAVING; wrap in a subquery or CTE
- Without an ORDER BY in OVER(), the frame defaults to all rows in the partition (unordered); results are non-deterministic
- ROWS vs RANGE frame mode differs when there are ties in the ORDER BY column
- ntile(n) distributes rows into n buckets as evenly as possible, not by value ranges
Code Snippets
DISTINCT ON vs ROW_NUMBER() for latest-per-group
-- Deduplicate keeping only the latest row per user:
SELECT DISTINCT ON (user_id) *
FROM events
ORDER BY user_id, created_at DESC;
-- Equivalent with ROW_NUMBER:
SELECT * FROM (
SELECT *, row_number() OVER (PARTITION BY user_id ORDER BY created_at DESC) AS rn
FROM events
) t WHERE rn = 1;Context
Analytics queries, reporting, and any computation that requires context from neighboring rows
Revisions (0)
No revisions yet.