snippetsqlMinor
PostgreSQL: Generate a series of dates for each group in a table
Viewed 0 times
postgresqleachgroupdatesgenerateforseriestable
Problem
I have a
It only contains balances for dates that a user has made a transaction. I need it to contain a row for each user with their balance on each date in a given date range.
I can reference an
My desired result looks like this:
Note that rows have been added for user 2 for
To generate a series of dates in a date range, I know I can use:
...but I am struggling with
balances table in PostgreSQL 9.3 that looks like this:CREATE TABLE balances (
user_id INT
, balance INT
, as_of_date DATE
);
INSERT INTO balances (user_id, balance, as_of_date) VALUES
(1, 100, '2016-01-03')
, (1, 50, '2016-01-02')
, (1, 10, '2016-01-01')
, (2, 200, '2016-01-01')
, (3, 30, '2016-01-03');It only contains balances for dates that a user has made a transaction. I need it to contain a row for each user with their balance on each date in a given date range.
- If the user does not have a row for a given date in the range, I need to use their balance from the day before.
- If the user created their account after a given date in the range, I need to avoid creating a row for that user/date combination.
I can reference an
accounts table to get users' create_date:CREATE TABLE accounts (
user_id INT
, create_date DATE
);
INSERT INTO accounts (user_id, create_date) VALUES
(1, '2015-12-01')
, (2, '2015-12-31')
, (3, '2016-01-03');My desired result looks like this:
+---------+---------+--------------------------+
| user_id | balance | as_of_date |
+---------+---------+--------------------------+
| 1 | 100 | 2016-01-03T00:00:00.000Z |
| 1 | 50 | 2016-01-02T00:00:00.000Z |
| 1 | 10 | 2016-01-01T00:00:00.000Z |
| 2 | 200 | 2016-01-03T00:00:00.000Z |
| 2 | 200 | 2016-01-02T00:00:00.000Z |
| 2 | 200 | 2016-01-01T00:00:00.000Z |
| 3 | 30 | 2016-01-03T00:00:00.000Z |
+---------+---------+--------------------------+Note that rows have been added for user 2 for
2016-01-02 and 2016-01-03, carrying over the previous balance from 2016-01-01; and that no rows have been added for user 3, who was created on 2016-01-03.To generate a series of dates in a date range, I know I can use:
SELECT d.date FROM GENERATE_SERIES('2016-01-01', '2016-01-03', '1 day'::INTERVAL) d...but I am struggling with
LEFT JOINing thatSolution
CROSS JOIN,LEFT JOIN LATERALto subquery
SELECT a.user_id, COALESCE(b.balance, 0) AS balance, d.as_of_date
FROM (
SELECT d::date AS as_of_date -- cast to date right away
FROM generate_series(timestamp '2016-01-01', '2016-01-03', interval '1 day') d
) d
JOIN accounts a ON a.create_date <= d.as_of_date
LEFT JOIN LATERAL (
SELECT balance
FROM balances
WHERE user_id = a.user_id
AND as_of_date <= d.as_of_date
ORDER BY as_of_date DESC
LIMIT 1
) b ON true
ORDER BY a.user_id, d.as_of_date;Returns your desired result - except that
as_of_date is an actual date, not a timestamp like in your example. That should be more appropriate.Users that are created already, but don't have any transactions, yet, are listed with a balance of 0. You did not define how to deal with the corner case.
Rather use
timestamp input for generate_series():- Generating time series between two dates in PostgreSQL
It's crucial for performance that you back this up with a multicolumn index:
CREATE INDEX balances_multi_idx ON balances (user_id, as_of_date DESC, balance);Very similar case on SO from just this week:
- Aggregating the most recent joined records per week
Find more explanation there.
CROSS JOIN,LEFT JOIN, window functions
SELECT user_id
, COALESCE(max(balance) OVER (PARTITION BY user_id, grp
ORDER BY as_of_date), 0) AS balance
, as_of_date
FROM (
SELECT a.user_id, b.balance, d.as_of_date
, count(b.user_id) OVER (PARTITION BY user_id ORDER BY as_of_date) AS grp
FROM (
SELECT d::date AS as_of_date -- cast to date right away
FROM generate_series(timestamp '2016-01-01', '2016-01-03', interval '1 day') d
) d
JOIN accounts a ON a.create_date <= d.as_of_date
LEFT JOIN balances b USING (user_id, as_of_date)
) sub
ORDER BY user_id, as_of_date;Same result. If you have the multicolumn index mentioned above and can get index-only scans out of it, the first solution is most probably faster.
The main feature is the running count of values to form groups. Since
count() does not count NULL values, all dates without balance fall into the same group (grp) as the most recent balance. Then use a simple max() over the same window frame extended by grp to copy the last balance for dangling gaps.Related:
- Select longest continuous sequence
- How to get max sequence of rows?
Code Snippets
SELECT a.user_id, COALESCE(b.balance, 0) AS balance, d.as_of_date
FROM (
SELECT d::date AS as_of_date -- cast to date right away
FROM generate_series(timestamp '2016-01-01', '2016-01-03', interval '1 day') d
) d
JOIN accounts a ON a.create_date <= d.as_of_date
LEFT JOIN LATERAL (
SELECT balance
FROM balances
WHERE user_id = a.user_id
AND as_of_date <= d.as_of_date
ORDER BY as_of_date DESC
LIMIT 1
) b ON true
ORDER BY a.user_id, d.as_of_date;CREATE INDEX balances_multi_idx ON balances (user_id, as_of_date DESC, balance);SELECT user_id
, COALESCE(max(balance) OVER (PARTITION BY user_id, grp
ORDER BY as_of_date), 0) AS balance
, as_of_date
FROM (
SELECT a.user_id, b.balance, d.as_of_date
, count(b.user_id) OVER (PARTITION BY user_id ORDER BY as_of_date) AS grp
FROM (
SELECT d::date AS as_of_date -- cast to date right away
FROM generate_series(timestamp '2016-01-01', '2016-01-03', interval '1 day') d
) d
JOIN accounts a ON a.create_date <= d.as_of_date
LEFT JOIN balances b USING (user_id, as_of_date)
) sub
ORDER BY user_id, as_of_date;Context
StackExchange Database Administrators Q#133406, answer score: 9
Revisions (0)
No revisions yet.