HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Fastest way to count how many dates' ranges cover each date from series

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
fromhoweachdatesserieswaydaterangesfastestmany

Problem

I have a table (in PostgreSQL 9.4) that looks like this:

CREATE TABLE dates_ranges (kind int, start_date date, end_date date);
INSERT INTO dates_ranges VALUES 
    (1, '2018-01-01', '2018-01-31'),
    (1, '2018-01-01', '2018-01-05'),
    (1, '2018-01-03', '2018-01-06'),
    (2, '2018-01-01', '2018-01-01'),
    (2, '2018-01-01', '2018-01-02'),
    (3, '2018-01-02', '2018-01-08'),
    (3, '2018-01-05', '2018-01-10');


Now I want to calculate for the given dates and for every kind, into how many rows from dates_ranges each date falls. Zeros could be possibly omitted.

Desired result:

+-------+------------+----+
|  kind | as_of_date |  n |
+-------+------------+----+
|     1 | 2018-01-01 |  2 |
|     1 | 2018-01-02 |  2 |
|     1 | 2018-01-03 |  3 |
|     2 | 2018-01-01 |  2 |
|     2 | 2018-01-02 |  1 |
|     3 | 2018-01-02 |  1 |
|     3 | 2018-01-03 |  1 |
+-------+------------+----+


I've come up with two solutions, one with LEFT JOIN and GROUP BY

SELECT
kind, as_of_date, COUNT(*) n
FROM
    (SELECT d::date AS as_of_date FROM generate_series('2018-01-01'::timestamp, '2018-01-03'::timestamp, '1 day') d) dates
LEFT JOIN
    dates_ranges ON dates.as_of_date BETWEEN start_date AND end_date
GROUP BY 1,2 ORDER BY 1,2


and one with LATERAL, which is slightly faster:

SELECT
    kind, as_of_date, n
FROM
    (SELECT d::date AS as_of_date FROM generate_series('2018-01-01'::timestamp, '2018-01-03'::timestamp, '1 day') d) dates,
LATERAL
    (SELECT kind, COUNT(*) AS n FROM dates_ranges WHERE dates.as_of_date BETWEEN start_date AND end_date GROUP BY kind) ss
ORDER BY kind, as_of_date


I'm wondering is it any better way to write this query? And how to include pairs date-kind with 0 count?

In reality there is a few distinct kinds, period of up to five years (1800 dates), and ~30k rows in dates_ranges table (but it could grow significantly).

There are no indexes. To be precise in my case it's a result of subquery, but I've wanted to li

Solution

And how to include pairs date-kind with 0 count?

Build a grid of all combinations then LATERAL join to your table, like this:

SELECT k.kind, d.as_of_date, c.n
FROM  (SELECT DISTINCT kind FROM dates_ranges) k
CROSS  JOIN (
   SELECT d::date AS as_of_date
   FROM   generate_series(timestamp '2018-01-01', timestamp '2018-01-03', interval '1 day') d
   ) d
CROSS  JOIN LATERAL (
   SELECT count(*)::int AS n
   FROM   dates_ranges
   WHERE  kind = k.kind
   AND    d.as_of_date BETWEEN start_date AND end_date
   ) c
ORDER  BY k.kind, d.as_of_date;


Should also be as fast as possible.

I had LEFT JOIN LATERAL ... on true at first, but there is an aggregate in the subquery c, so we always get a row and can use CROSS JOIN as well. No difference in performance.

If you have a table holding all relevant kinds, use that instead of generating the list with subquery k.

The cast to integer is optional. Else you get bigint.

Indexes would help, especially a multicolumn index on (kind, start_date, end_date). Since you are building on a subquery, this may or may not be possible to achieve.

Using set-returning functions like generate_series() in the SELECT list is generally not advisable in Postgres versions before 10 (unless you know exactly what you are doing). See:

  • What is the expected behaviour for multiple set-returning functions in select clause?



If you have lots of combinations with few or no rows, this equivalent form may be faster:

SELECT k.kind, d.as_of_date, count(dr.kind)::int AS n
FROM  (SELECT DISTINCT kind FROM dates_ranges) k
CROSS JOIN (
   SELECT d::date AS as_of_date
   FROM   generate_series(timestamp '2018-01-01', timestamp '2018-01-03', interval '1 day') d
   ) d
LEFT   JOIN dates_ranges dr ON dr.kind = k.kind
                           AND d.as_of_date BETWEEN dr.start_date AND dr.end_date
GROUP  BY 1, 2
ORDER  BY 1, 2;

Code Snippets

SELECT k.kind, d.as_of_date, c.n
FROM  (SELECT DISTINCT kind FROM dates_ranges) k
CROSS  JOIN (
   SELECT d::date AS as_of_date
   FROM   generate_series(timestamp '2018-01-01', timestamp '2018-01-03', interval '1 day') d
   ) d
CROSS  JOIN LATERAL (
   SELECT count(*)::int AS n
   FROM   dates_ranges
   WHERE  kind = k.kind
   AND    d.as_of_date BETWEEN start_date AND end_date
   ) c
ORDER  BY k.kind, d.as_of_date;
SELECT k.kind, d.as_of_date, count(dr.kind)::int AS n
FROM  (SELECT DISTINCT kind FROM dates_ranges) k
CROSS JOIN (
   SELECT d::date AS as_of_date
   FROM   generate_series(timestamp '2018-01-01', timestamp '2018-01-03', interval '1 day') d
   ) d
LEFT   JOIN dates_ranges dr ON dr.kind = k.kind
                           AND d.as_of_date BETWEEN dr.start_date AND dr.end_date
GROUP  BY 1, 2
ORDER  BY 1, 2;

Context

StackExchange Database Administrators Q#208241, answer score: 6

Revisions (0)

No revisions yet.