HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Grouping records with with consecutive start and end dates -Postgresql

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
postgresqlgroupingwithrecordsdatesstartandendconsecutive

Problem

We are migrating to a new record management system and I'm weeding through the issues with the data migration. Source data and destination data are all in Postgresql

In the source database we have multiple rows for the same vendor for each authorization code for that vendor and the start and end date of that authorization.

+----------+---------------+------------+-----------+
|  Vendor  | Authorization | Start Date | End Date  | 
+----------+---------------+------------+-----------+
| Vendor 1 | #####         | 1/1/2017   | 2/28/2017 | 
| Vendor 1 | #####         | 3/1/2017   | 4/15/2017 |
| Vendor 2 | #####         | 4/16/2017  | 5/31/2017 |
| Vendor 2 | #####         | 6/1/2017   | 7/12/2017 | 
| Vendor 1 | #####         | 7/13/2017  | 9/30/2017 |
+----------+---------------+------------+-----------+


In the destination database we have records for the active vendor relationship at a given time with a start date and an end date. (Auth # start and end dates are tracked separately) After migration the example listed previously should look like this.

+----------+------------+-----------+
|  Vendor  | Start Date | End Date  |
+----------+------------+-----------+
| Vendor 1 | 1/1/2017   | 4/15/2017 |
| Vendor 2 | 4/16/2017  | 7/12/2017 |
| Vendor 1 | 7/13/2017  | 9/30/2017 |
+----------+------------+-----------+


I think I should be able to accomplish this joining the table back on it self n-1 number of times where n is the largest number of consecutive authorization numbers for the same vendor which is probably around 10, which is doable but definitely not preferable.

Is there a more common method to group data by consecutive start and end dates?

Solution

You could use a GROUPING AND WINDOW solution in this way:

create table tbl (vendor text, auth text, StartDate Date, EndDate Date);




insert into tbl values
('Vendor 1', '#####', '20170101', '20170228'),
('Vendor 1', '#####', '20170301', '20170415'),
('Vendor 2', '#####', '20170416', '20170531'),
('Vendor 2', '#####', '20170601', '20170712'),
('Vendor 1', '#####', '20170703', '20170930');


5 rows affected

select vendor, min(startdate) as startdate, max(enddate) as enddate, grp
from (
      select vendor, auth, startdate, enddate, 
             sum(rst) over (order by vendor, startdate) as grp
      from (
             select vendor, auth, startdate, enddate, 
                    case when coalesce(lag(enddate) over (partition by vendor order by vendor, startdate), startdate) + 1 <> startdate then 1 end rst
             from   tbl
           ) t1
     ) t2
group by grp, vendor
order by startdate


vendor | startdate | enddate | grp
:------- | :--------- | :--------- | --:
Vendor 1 | 2017-01-01 | 2017-04-15 | 1
Vendor 2 | 2017-04-16 | 2017-07-12 | 3
Vendor 1 | 2017-07-03 | 2017-09-30 | 2

db<>fiddle here

Code Snippets

create table tbl (vendor text, auth text, StartDate Date, EndDate Date);
insert into tbl values
('Vendor 1', '#####', '20170101', '20170228'),
('Vendor 1', '#####', '20170301', '20170415'),
('Vendor 2', '#####', '20170416', '20170531'),
('Vendor 2', '#####', '20170601', '20170712'),
('Vendor 1', '#####', '20170703', '20170930');
select vendor, min(startdate) as startdate, max(enddate) as enddate, grp
from (
      select vendor, auth, startdate, enddate, 
             sum(rst) over (order by vendor, startdate) as grp
      from (
             select vendor, auth, startdate, enddate, 
                    case when coalesce(lag(enddate) over (partition by vendor order by vendor, startdate), startdate) + 1 <> startdate then 1 end rst
             from   tbl
           ) t1
     ) t2
group by grp, vendor
order by startdate

Context

StackExchange Database Administrators Q#207856, answer score: 5

Revisions (0)

No revisions yet.