HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Get a count of consecutive dates

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
countdatesgetconsecutive

Problem

I want to get a list of dates where there are consecutive dates and it's corresponding count.

For example, if the I have the following data set

Date
2021-07-28
2021-07-27
2021-07-26
2021-07-25
2021-07-24
2021-07-23
2021-07-22
2021-07-21
2021-07-18
2021-07-17
2021-07-14
2021-07-11
2021-07-09
2021-07-06
2021-07-04
2021-07-03
2021-07-02


The result I would like is where consecutive date count > x

2021-07-21  8
2021-07-17  2
2021-07-02  3


I'm not really sure how to approach this problem. If an explanation could be provided with the query that would be great, although not required.

Solution

As correctly noted by Charlieface, this is a Gaps and Islands problem. Another way of solving this specific variation – also involving a window function, though a different one this time – would go like this:

WITH
  partitioned AS
  (
    SELECT
      *
    , DATEDIFF(Date, '1970-01-01') - ROW_NUMBER() OVER (ORDER BY Date ASC) AS PartID
    FROM
      YourTable
  )
SELECT
  MIN(Date) AS StartDate
, COUNT(*)  AS DayCount
FROM
  partitioned
GROUP BY
  PartID
HAVING
  COUNT(*) > 1
ORDER BY
  PartID
;


This solution relies on the fact that the difference between a representation of a date as an integer (DATEDIFF(...)) and the date's numerical position in an ordered sequence (ROW_NUMBER() OVER ...) is a constant value. If we looked at the intermediate values returned by the functions in the PartID expression, we would find the following:

Date
DATEDIFF(Date, '1970-01-01')
ROW_NUMBER() OVER (ORDER BY Date ASC)
PartID

2021-07-02
18810
1
18809

2021-07-03
18811
2
18809

2021-07-04
18812
3
18809

2021-07-06
18814
4
18810

2021-07-09
18817
5
18812

2021-07-11
18819
6
18813

2021-07-14
18822
7
18815

2021-07-17
18825
8
18817

2021-07-18
18826
9
18817

2021-07-21
18829
10
18819

2021-07-22
18830
11
18819

2021-07-23
18831
12
18819

2021-07-24
18832
13
18819

2021-07-25
18833
14
18819

2021-07-26
18834
15
18819

2021-07-27
18835
16
18819

2021-07-28
18836
17
18819

As you can see, the difference between DATEDIFF and ROW_NUMBER (represented by the column PartID) is the same where dates are consecutive, and it is different for different sequences, which makes it a perfect candidate for a GROUP BY criterion. And that is exactly what the query is using it for. By the way, the date 1970-01-01 has no specific meaning in this case. Any date could be used instead of it as long as it is a constant value.

Another important note to make – and it makes this answer substantially different from Charlieface's suggestion – is that all the dates must be unique for the method to work as expected.

A live demo of this solution can be found at db<>fiddle.

Code Snippets

WITH
  partitioned AS
  (
    SELECT
      *
    , DATEDIFF(Date, '1970-01-01') - ROW_NUMBER() OVER (ORDER BY Date ASC) AS PartID
    FROM
      YourTable
  )
SELECT
  MIN(Date) AS StartDate
, COUNT(*)  AS DayCount
FROM
  partitioned
GROUP BY
  PartID
HAVING
  COUNT(*) > 1
ORDER BY
  PartID
;

Context

StackExchange Database Administrators Q#297319, answer score: 3

Revisions (0)

No revisions yet.