patternsqlMinor
Grouping records with with consecutive start and end dates -Postgresql
Viewed 0 times
postgresqlgroupingwithrecordsdatesstartandendconsecutive
Problem
We are migrating to a new record management system and I'm weeding through the issues with the data migration. Source data and destination data are all in Postgresql
In the source database we have multiple rows for the same vendor for each authorization code for that vendor and the start and end date of that authorization.
In the destination database we have records for the active vendor relationship at a given time with a start date and an end date. (Auth # start and end dates are tracked separately) After migration the example listed previously should look like this.
I think I should be able to accomplish this joining the table back on it self n-1 number of times where n is the largest number of consecutive authorization numbers for the same vendor which is probably around 10, which is doable but definitely not preferable.
Is there a more common method to group data by consecutive start and end dates?
In the source database we have multiple rows for the same vendor for each authorization code for that vendor and the start and end date of that authorization.
+----------+---------------+------------+-----------+
| Vendor | Authorization | Start Date | End Date |
+----------+---------------+------------+-----------+
| Vendor 1 | ##### | 1/1/2017 | 2/28/2017 |
| Vendor 1 | ##### | 3/1/2017 | 4/15/2017 |
| Vendor 2 | ##### | 4/16/2017 | 5/31/2017 |
| Vendor 2 | ##### | 6/1/2017 | 7/12/2017 |
| Vendor 1 | ##### | 7/13/2017 | 9/30/2017 |
+----------+---------------+------------+-----------+In the destination database we have records for the active vendor relationship at a given time with a start date and an end date. (Auth # start and end dates are tracked separately) After migration the example listed previously should look like this.
+----------+------------+-----------+
| Vendor | Start Date | End Date |
+----------+------------+-----------+
| Vendor 1 | 1/1/2017 | 4/15/2017 |
| Vendor 2 | 4/16/2017 | 7/12/2017 |
| Vendor 1 | 7/13/2017 | 9/30/2017 |
+----------+------------+-----------+I think I should be able to accomplish this joining the table back on it self n-1 number of times where n is the largest number of consecutive authorization numbers for the same vendor which is probably around 10, which is doable but definitely not preferable.
Is there a more common method to group data by consecutive start and end dates?
Solution
You could use a GROUPING AND WINDOW solution in this way:
✓
5 rows affected
vendor | startdate | enddate | grp
:------- | :--------- | :--------- | --:
Vendor 1 | 2017-01-01 | 2017-04-15 | 1
Vendor 2 | 2017-04-16 | 2017-07-12 | 3
Vendor 1 | 2017-07-03 | 2017-09-30 | 2
db<>fiddle here
create table tbl (vendor text, auth text, StartDate Date, EndDate Date);✓
insert into tbl values
('Vendor 1', '#####', '20170101', '20170228'),
('Vendor 1', '#####', '20170301', '20170415'),
('Vendor 2', '#####', '20170416', '20170531'),
('Vendor 2', '#####', '20170601', '20170712'),
('Vendor 1', '#####', '20170703', '20170930');5 rows affected
select vendor, min(startdate) as startdate, max(enddate) as enddate, grp
from (
select vendor, auth, startdate, enddate,
sum(rst) over (order by vendor, startdate) as grp
from (
select vendor, auth, startdate, enddate,
case when coalesce(lag(enddate) over (partition by vendor order by vendor, startdate), startdate) + 1 <> startdate then 1 end rst
from tbl
) t1
) t2
group by grp, vendor
order by startdatevendor | startdate | enddate | grp
:------- | :--------- | :--------- | --:
Vendor 1 | 2017-01-01 | 2017-04-15 | 1
Vendor 2 | 2017-04-16 | 2017-07-12 | 3
Vendor 1 | 2017-07-03 | 2017-09-30 | 2
db<>fiddle here
Code Snippets
create table tbl (vendor text, auth text, StartDate Date, EndDate Date);insert into tbl values
('Vendor 1', '#####', '20170101', '20170228'),
('Vendor 1', '#####', '20170301', '20170415'),
('Vendor 2', '#####', '20170416', '20170531'),
('Vendor 2', '#####', '20170601', '20170712'),
('Vendor 1', '#####', '20170703', '20170930');select vendor, min(startdate) as startdate, max(enddate) as enddate, grp
from (
select vendor, auth, startdate, enddate,
sum(rst) over (order by vendor, startdate) as grp
from (
select vendor, auth, startdate, enddate,
case when coalesce(lag(enddate) over (partition by vendor order by vendor, startdate), startdate) + 1 <> startdate then 1 end rst
from tbl
) t1
) t2
group by grp, vendor
order by startdateContext
StackExchange Database Administrators Q#207856, answer score: 5
Revisions (0)
No revisions yet.