patternsqlMinor
Joining multiple subqueries
Viewed 0 times
multiplesubqueriesjoining
Problem
I need a query that returns something in this format:
My current data looks like this
The source table has ~160,000,000 rows. All columns are indexed (btree).
Currently I'm doing 5 (the real number is dynamic and maybe around 30, but for example above it's 5) different queries to extract:
1:
2:
3:
etc
and then join these queries. Something like this:
But as I mentioned above number of joins are much more (~30) and it runs really slow on Postgres. I examined the planning and disabled
Any ideas? I think there must be a better way than making 30 subqueries and joining them. Another approach would be grouping rows based on
P.S. It's not applicable to pre-generate the result table. Everything must happen runtime.
pk id v1 v2 v3 v4 v5
----------------------
...
pk1 id1 A1 A2 A3 A4 A5
pk6 id2 B1 B2 B3 B4 B5
...My current data looks like this
CREATE TABLE foo(pk,id,value)
AS VALUES
( 'pk1' , 'id1', 'A1' ),
( 'pk2' , 'id1', 'A2' ),
( 'pk3' , 'id1', 'A3' ),
( 'pk4' , 'id1', 'A4' ),
( 'pk5' , 'id1', 'A5' ),
( 'pk6' , 'id2', 'B1' ),
( 'pk7' , 'id2', 'B2' ),
( 'pk8' , 'id2', 'B3' ),
( 'pk9' , 'id2', 'B4' ),
( 'pk10', 'id2', 'B5' )
;The source table has ~160,000,000 rows. All columns are indexed (btree).
Currently I'm doing 5 (the real number is dynamic and maybe around 30, but for example above it's 5) different queries to extract:
1:
P1 id1 A1
P6 id2 B12:
P2 id1 A2
P7 id2 B23:
P3 id1 A3
P8 id2 B4etc
and then join these queries. Something like this:
SELECT q1.pk,q1.id,q1.v1,q2.v2,q3.v3,q4.v4,q5.v5 FROM (SELECT pk, id, value FROM table WHERE id=1) AS q1, (SELECT pk, id, value FROM table WHERE id=2) AS q2, (SELECT pk, id, value FROM table WHERE id=3) AS q3, (SELECT pk, id, value FROM table WHERE id=4) AS q4, (SELECT pk, id, value FROM table WHERE id=5) AS q5 WHERE q1.id=q2.id and q2.id=q3.id and q3.id=q4.id and q4.id=q5.id;But as I mentioned above number of joins are much more (~30) and it runs really slow on Postgres. I examined the planning and disabled
nestloop and it got much faster but it's still too slow. If I fetch all these queries to memory and join them programmatically (using python for example) it runs in ~1 second. But Postgres takes too long (~30 seconds for joining 110 rows).Any ideas? I think there must be a better way than making 30 subqueries and joining them. Another approach would be grouping rows based on
id column and do some magic (for example with RowNumber()) to generate the desired table.P.S. It's not applicable to pre-generate the result table. Everything must happen runtime.
Solution
For your example, all you want is
If you need the
Note, in all future examples please don't use
SELECT
min(pk),
id,
array_agg(value ORDER BY value)
FROM foo
GROUP BY id;
min | id | array_agg
------+-----+------------------
pk1 | id1 | {A1,A2,A3,A4,A5}
pk10 | id2 | {B1,B2,B3,B4,B5}
(2 rows)If you need the
array_agg to be unwrapped, you can do,SELECT pk, id, a[1] AS v1, a[2] AS v2, a[3] AS v3, a[4] AS v4, a[5] AS v5
FROM (
SELECT
min(pk),
id,
array_agg(value ORDER BY value)
FROM foo
GROUP BY id
) AS t(pk, id, a);
pk | id | v1 | v2 | v3 | v4 | v5
------+-----+----+----+----+----+----
pk1 | id1 | A1 | A2 | A3 | A4 | A5
pk10 | id2 | B1 | B2 | B3 | B4 | B5Note, in all future examples please don't use
id1 and pk1. You're sample values should preferably be just 1 and 2. As you can see here, we're struggling with getting pk10 instead of pk6 and that's a problem only because of the sort order.Code Snippets
SELECT
min(pk),
id,
array_agg(value ORDER BY value)
FROM foo
GROUP BY id;
min | id | array_agg
------+-----+------------------
pk1 | id1 | {A1,A2,A3,A4,A5}
pk10 | id2 | {B1,B2,B3,B4,B5}
(2 rows)SELECT pk, id, a[1] AS v1, a[2] AS v2, a[3] AS v3, a[4] AS v4, a[5] AS v5
FROM (
SELECT
min(pk),
id,
array_agg(value ORDER BY value)
FROM foo
GROUP BY id
) AS t(pk, id, a);
pk | id | v1 | v2 | v3 | v4 | v5
------+-----+----+----+----+----+----
pk1 | id1 | A1 | A2 | A3 | A4 | A5
pk10 | id2 | B1 | B2 | B3 | B4 | B5Context
StackExchange Database Administrators Q#202197, answer score: 2
Revisions (0)
No revisions yet.