HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Merging two tables by minimal interval values

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
tablesvaluesmergingintervaltwominimal

Problem

I have two tables with two columns keeping tabs of categorical values, such as:

Table 1
+----+-------+-----+-----------+
| ID | Begin | End | Condition |
+----+-------+-----+-----------+
|  1 |     1 |   8 |    Normal |
|  2 |     8 |  23 |  Critical |
|  3 |    23 |  30 |    Normal |
+----+-------+-----+-----------+

Table 2
+----+-------+-----+------------+
| ID | Begin | End | Supervisor |
+----+-------+-----+------------+
|  1 |     1 |  14 |       John |
|  2 |    14 |  30 |     Janice |
+----+-------+-----+------------+


These Begin and End columns represent a continuous interval in which the value is valid. In the above example, the interval would be days in a month, so according to Table 1 I'd have a normal condition from day 1 to day 8, critical from day 8 to day 23, and normal again from the 23rd to the 30th. And I know from Table 2 that John was supervising from the 1st to the 14th, and Janice took over from the 14th to the 30th.

What I want is to merge these two tables, to have both Condition and Supervisor values in the same table, with the minimal interval for each pairing of values. So, this:

Merged Table
+----+-------+-----+-----------+------------+
| ID | Begin | End | Condition | Supervisor |
+----+-------+-----+-----------+------------+
|  1 |     1 |   8 |    Normal |       John |
|  2 |     8 |  14 |  Critical |       John |
|  3 |    14 |  23 |  Critical |     Janice |
|  4 |    23 |  30 |    Normal |     Janice |
+----+-------+-----+-----------+------------+


What can be guaranteed of the tables is that:

  • each will have its respective "Begin" and "End" fields.



  • those will have the same span (the "Begin" value of the first row and the "End" value of the last row are the same for every table).



  • that every interval is sequential (the "End" value of row n is always equal to the "Begin" value of row n+1).



  • that the value of "End" will always be higher than "Begin" for a given row.



I had this worked out programatically in

Solution

I'd approach this in three logical stages:

  • Expand the rows in both tables for all the days in the month and join them together.



  • Assign a 'group' to each series of rows where condition and supervisor don't change for a period (a "gaps and islands" problem).



  • Group the results.



So the solution looks like this:

create table t1(
  id serial primary key
, begin_on integer
, end_on integer
, condition text
);

insert into t1(begin_on,end_on,condition)
values (1,8,'Normal')
     , (8,23,'Critical')
     , (23,30,'Normal');

create table t2(
  id serial primary key
, begin_on integer
, end_on integer
, supervisor text
);

insert into t2(begin_on,end_on,supervisor)
values (1,14,'John')
     , (14,30,'Janice');


select min(g) begin_on, max(g)+1 end_on, condition, supervisor
from( select g
           , condition
           , supervisor
           , row_number() over (order by g) 
             - row_number() over (partition by t1.id, t2.id order by g) grp
      from generate_series(1,30) g
           join t1 on g>=t1.begin_on and g=t2.begin_on and g<t2.end_on ) z
group by condition, supervisor, grp
order by begin_on;


begin_on | end_on | condition | supervisor
-------: | -----: | :-------- | :---------
1 | 8 | Normal | John
8 | 14 | Critical | John
14 | 23 | Critical | Janice
23 | 30 | Normal | Janice

dbfiddle here

I am assuming this is a cut-down example of your real-world problem or I would also suggest changing the way you store the data in the first place, perhaps using date ranges instead of integers.

Code Snippets

create table t1(
  id serial primary key
, begin_on integer
, end_on integer
, condition text
);

insert into t1(begin_on,end_on,condition)
values (1,8,'Normal')
     , (8,23,'Critical')
     , (23,30,'Normal');

create table t2(
  id serial primary key
, begin_on integer
, end_on integer
, supervisor text
);

insert into t2(begin_on,end_on,supervisor)
values (1,14,'John')
     , (14,30,'Janice');
select min(g) begin_on, max(g)+1 end_on, condition, supervisor
from( select g
           , condition
           , supervisor
           , row_number() over (order by g) 
             - row_number() over (partition by t1.id, t2.id order by g) grp
      from generate_series(1,30) g
           join t1 on g>=t1.begin_on and g<t1.end_on
           join t2 on g>=t2.begin_on and g<t2.end_on ) z
group by condition, supervisor, grp
order by begin_on;

Context

StackExchange Database Administrators Q#199846, answer score: 3

Revisions (0)

No revisions yet.