patternsqlMinor
Fastest way to count how many dates' ranges cover each date from series
Viewed 0 times
fromhoweachdatesserieswaydaterangesfastestmany
Problem
I have a table (in PostgreSQL 9.4) that looks like this:
Now I want to calculate for the given dates and for every kind, into how many rows from
Desired result:
I've come up with two solutions, one with
and one with
I'm wondering is it any better way to write this query? And how to include pairs date-kind with 0 count?
In reality there is a few distinct kinds, period of up to five years (1800 dates), and ~30k rows in
There are no indexes. To be precise in my case it's a result of subquery, but I've wanted to li
CREATE TABLE dates_ranges (kind int, start_date date, end_date date);
INSERT INTO dates_ranges VALUES
(1, '2018-01-01', '2018-01-31'),
(1, '2018-01-01', '2018-01-05'),
(1, '2018-01-03', '2018-01-06'),
(2, '2018-01-01', '2018-01-01'),
(2, '2018-01-01', '2018-01-02'),
(3, '2018-01-02', '2018-01-08'),
(3, '2018-01-05', '2018-01-10');Now I want to calculate for the given dates and for every kind, into how many rows from
dates_ranges each date falls. Zeros could be possibly omitted.Desired result:
+-------+------------+----+
| kind | as_of_date | n |
+-------+------------+----+
| 1 | 2018-01-01 | 2 |
| 1 | 2018-01-02 | 2 |
| 1 | 2018-01-03 | 3 |
| 2 | 2018-01-01 | 2 |
| 2 | 2018-01-02 | 1 |
| 3 | 2018-01-02 | 1 |
| 3 | 2018-01-03 | 1 |
+-------+------------+----+I've come up with two solutions, one with
LEFT JOIN and GROUP BYSELECT
kind, as_of_date, COUNT(*) n
FROM
(SELECT d::date AS as_of_date FROM generate_series('2018-01-01'::timestamp, '2018-01-03'::timestamp, '1 day') d) dates
LEFT JOIN
dates_ranges ON dates.as_of_date BETWEEN start_date AND end_date
GROUP BY 1,2 ORDER BY 1,2and one with
LATERAL, which is slightly faster:SELECT
kind, as_of_date, n
FROM
(SELECT d::date AS as_of_date FROM generate_series('2018-01-01'::timestamp, '2018-01-03'::timestamp, '1 day') d) dates,
LATERAL
(SELECT kind, COUNT(*) AS n FROM dates_ranges WHERE dates.as_of_date BETWEEN start_date AND end_date GROUP BY kind) ss
ORDER BY kind, as_of_dateI'm wondering is it any better way to write this query? And how to include pairs date-kind with 0 count?
In reality there is a few distinct kinds, period of up to five years (1800 dates), and ~30k rows in
dates_ranges table (but it could grow significantly).There are no indexes. To be precise in my case it's a result of subquery, but I've wanted to li
Solution
And how to include pairs date-kind with 0 count?
Build a grid of all combinations then
Should also be as fast as possible.
I had
If you have a table holding all relevant kinds, use that instead of generating the list with subquery
The cast to
Indexes would help, especially a multicolumn index on
Using set-returning functions like
If you have lots of combinations with few or no rows, this equivalent form may be faster:
Build a grid of all combinations then
LATERAL join to your table, like this:SELECT k.kind, d.as_of_date, c.n
FROM (SELECT DISTINCT kind FROM dates_ranges) k
CROSS JOIN (
SELECT d::date AS as_of_date
FROM generate_series(timestamp '2018-01-01', timestamp '2018-01-03', interval '1 day') d
) d
CROSS JOIN LATERAL (
SELECT count(*)::int AS n
FROM dates_ranges
WHERE kind = k.kind
AND d.as_of_date BETWEEN start_date AND end_date
) c
ORDER BY k.kind, d.as_of_date;Should also be as fast as possible.
I had
LEFT JOIN LATERAL ... on true at first, but there is an aggregate in the subquery c, so we always get a row and can use CROSS JOIN as well. No difference in performance.If you have a table holding all relevant kinds, use that instead of generating the list with subquery
k.The cast to
integer is optional. Else you get bigint.Indexes would help, especially a multicolumn index on
(kind, start_date, end_date). Since you are building on a subquery, this may or may not be possible to achieve.Using set-returning functions like
generate_series() in the SELECT list is generally not advisable in Postgres versions before 10 (unless you know exactly what you are doing). See:- What is the expected behaviour for multiple set-returning functions in select clause?
If you have lots of combinations with few or no rows, this equivalent form may be faster:
SELECT k.kind, d.as_of_date, count(dr.kind)::int AS n
FROM (SELECT DISTINCT kind FROM dates_ranges) k
CROSS JOIN (
SELECT d::date AS as_of_date
FROM generate_series(timestamp '2018-01-01', timestamp '2018-01-03', interval '1 day') d
) d
LEFT JOIN dates_ranges dr ON dr.kind = k.kind
AND d.as_of_date BETWEEN dr.start_date AND dr.end_date
GROUP BY 1, 2
ORDER BY 1, 2;Code Snippets
SELECT k.kind, d.as_of_date, c.n
FROM (SELECT DISTINCT kind FROM dates_ranges) k
CROSS JOIN (
SELECT d::date AS as_of_date
FROM generate_series(timestamp '2018-01-01', timestamp '2018-01-03', interval '1 day') d
) d
CROSS JOIN LATERAL (
SELECT count(*)::int AS n
FROM dates_ranges
WHERE kind = k.kind
AND d.as_of_date BETWEEN start_date AND end_date
) c
ORDER BY k.kind, d.as_of_date;SELECT k.kind, d.as_of_date, count(dr.kind)::int AS n
FROM (SELECT DISTINCT kind FROM dates_ranges) k
CROSS JOIN (
SELECT d::date AS as_of_date
FROM generate_series(timestamp '2018-01-01', timestamp '2018-01-03', interval '1 day') d
) d
LEFT JOIN dates_ranges dr ON dr.kind = k.kind
AND d.as_of_date BETWEEN dr.start_date AND dr.end_date
GROUP BY 1, 2
ORDER BY 1, 2;Context
StackExchange Database Administrators Q#208241, answer score: 6
Revisions (0)
No revisions yet.