HiveBrain v1.2.0
Get Started
← Back to all entries
patternsqlMinor

Combine two event tables for multiple timelines into a single result set

Submitted by: @import:stackexchange-dba··
0
Viewed 0 times
resulttablescombineintotwosingleformultipletimelinesset

Problem

This question is an extension to a question I've previously asked that was overly simplified. The more accurate example is demonstrated in this SQLFiddle, where I demonstrate a working (but slow) solution, followed by my attempt to adapt the previous answer to the actual problem.

The actual problem comes because the two tables contain events for multiple timelines.

CREATE TABLE foo (ts int, id text, foo text);
INSERT INTO foo (ts, id, foo)
VALUES
    (1, 'A', 'Lorem'),
    (1, 'B', 'ipsum'),
    (4, 'B', 'dolor'),
    (5, 'A', 'sit'),
    (8, 'A', 'amet'),
    (8, 'B', 'consectetur');

CREATE TABLE bar (ts int, id text, bar text);
INSERT INTO bar (ts, id, bar)
VALUES
    (1, 'A', 'adipiscing'),
    (5, 'B', 'elit'),
    (6, 'A', 'sed'),
    (9, 'B', 'do ');


Each table has events for timelines 'A' and 'B'. The goal is to combined the results in to a single result set showing the "state" of each timeline. The two timelines are orthogonal.

ts id foo bar
1 A Lorem adipiscing
5 A sit adipiscing
6 A sit sed
8 A amet sed
1 B ipsum (null)
4 B dolor (null)
5 B dolor elit
8 B consectetur elit
9 B consectetur do

Solution

In addition to the solution of the simple case, add a PARTITION clause to the window functions in the inner query, to get group numbers per partition (per "timeline"). Combine group numbers with the respective timeline (id in your example) keep partitions separate in the second step:

SELECT id, ts
     , min(foo) OVER (PARTITION BY id, foo_grp) AS foo
     , min(bar) OVER (PARTITION BY id, bar_grp) AS bar
FROM (
   SELECT id, ts, f.foo, b.bar
        , count(f.foo) OVER (PARTITION BY id ORDER BY ts) AS foo_grp
        , count(b.bar) OVER (PARTITION BY id ORDER BY ts) AS bar_grp
   FROM   foo f
   FULL   JOIN bar b USING (id, ts)
   ) sub
ORDER  BY 1, 2;


Result as requested (except with id first).

SQL Fiddle

Your attempt to adapt the previous solution was very close. It didn't work because of PARTITION BY f.id / PARTITION BY b.id instead of PARTITION BY id. You really want the combined id to include missing rows in the result - that's where the last non-null value has to be filled in for the missing (NULL) value.

If performance is your paramount requirement consider a server-side function like demonstrated in the previous answer.

Code Snippets

SELECT id, ts
     , min(foo) OVER (PARTITION BY id, foo_grp) AS foo
     , min(bar) OVER (PARTITION BY id, bar_grp) AS bar
FROM (
   SELECT id, ts, f.foo, b.bar
        , count(f.foo) OVER (PARTITION BY id ORDER BY ts) AS foo_grp
        , count(b.bar) OVER (PARTITION BY id ORDER BY ts) AS bar_grp
   FROM   foo f
   FULL   JOIN bar b USING (id, ts)
   ) sub
ORDER  BY 1, 2;

Context

StackExchange Database Administrators Q#106514, answer score: 4

Revisions (0)

No revisions yet.