patternsqlMinor
Need for recursive lookup, stuck in query design
Viewed 0 times
needstuckquerydesignrecursiveforlookup
Problem
In PostgreSQL 9.4, here is my table:
In a table that stores pairs of user id, i.e. two integer columns (user id and partner), we are requested to find this kind of relations:
(both cases userid = m && partneruserid = n and userid = n && partneruserid = m)
users m .. x .. y .. z .. a .. b .. n (always containing the cases in either two columns).
In this table, if we ask for the relationship between users 1 and 3, the function should return:
Approach 1
Result set so far: [1,3]
-
and the not so obvious row 6: 3-1 (idem)
Result set so far:
-
then if we follow the paths from user 1 (which are users 2, 3 and 6) and are looking for a partner one hop from user 1, we would then reach rows in which partners of user 1 (user 2 and user 6) are in the record as partners of user 3. These rows could have been [2, 3], [3, 2], [6, 3] and [3, 6] if they existed, but only row 10 does, being the result set so far:
-
the next step would be search for rows that have user 2 and user 6 (first hop) partnered with partners of user 3 (1, 5 and 6) (this is the second hop), reaching: row 4 (1..row1..2..row5..6..row10..3), row 5 (1..row1..2..row4..5..row8..3) and row 9 (1..row1..2..row9..6..row10..3).
Result set so far:
-
then with these last rows added to the result set, we would need to
with r(pk, userid,partneruserid) as
(values
(1,1,2),
(2,1,3),
(3,1,6),
(4,2,5),
(5,2,6),
(6,3,1),
(7,4,1),
(8,5,3),
(9,6,2),
(10,6,3)
) select * from r;In a table that stores pairs of user id, i.e. two integer columns (user id and partner), we are requested to find this kind of relations:
- User m and user n in the same record
(both cases userid = m && partneruserid = n and userid = n && partneruserid = m)
- indirect partnership with a third X partner in common (in either two columns)
- more indirect partnership with a third and a fourth X and Y partner in the chain which are partners between them: m .. x .. y .. n
- So on and so forth until there are no more relationships or until the nest level is 5, meaning relation with five steps like:
users m .. x .. y .. z .. a .. b .. n (always containing the cases in either two columns).
In this table, if we ask for the relationship between users 1 and 3, the function should return:
Approach 1
- first, the obvious row 2: 1-3 (single record, no hops)
Result set so far: [1,3]
-
and the not so obvious row 6: 3-1 (idem)
Result set so far:
[1,3]
[3,1]-
then if we follow the paths from user 1 (which are users 2, 3 and 6) and are looking for a partner one hop from user 1, we would then reach rows in which partners of user 1 (user 2 and user 6) are in the record as partners of user 3. These rows could have been [2, 3], [3, 2], [6, 3] and [3, 6] if they existed, but only row 10 does, being the result set so far:
[1,3] [3,1] [6,3]-
the next step would be search for rows that have user 2 and user 6 (first hop) partnered with partners of user 3 (1, 5 and 6) (this is the second hop), reaching: row 4 (1..row1..2..row5..6..row10..3), row 5 (1..row1..2..row4..5..row8..3) and row 9 (1..row1..2..row9..6..row10..3).
Result set so far:
[1,3]
[3,1]
[6,3]
[2,5]
[2,6]
[6,2]-
then with these last rows added to the result set, we would need to
Solution
Data model
First of all: fold duplicate entries. I don't see why you would allow reverse duplicates to begin with, so your setup becomes:
Recursive query
Building on the sanitized model (the constraint
SQL Fiddle
Except for the count. It's unclear how you count duplicate connections exactly. Strictly speaking, you should multiply
Explanation
The adapted table forces
I pick the lower input value (
We have to remember the path so far (
I save the last node of the path as first element (
Stop traversing the graph when
Only rows where
First of all: fold duplicate entries. I don't see why you would allow reverse duplicates to begin with, so your setup becomes:
CREATE TEMP TABLE tbl(
pk serial PRIMARY KEY
, userid int NOT NULL
, partneruserid int NOT NULL
, ct int DEFAULT 1 NOT NULL -- this part is unclear
, CONSTRAINT user_partner_uni UNIQUE (userid, partneruserid)
, CONSTRAINT userid_smaller CHECK (userid < partneruserid)
);
INSERT INTO tbl
VALUES
(1,1,2,1)
, (2,1,3,2) -- folded
, (3,1,6,1)
, (4,2,5,1)
, (5,2,6,2) -- folded
, (7,1,4,1)
, (8,3,5,1)
, (10,3,6,1);Recursive query
Building on the sanitized model (the constraint
userid_smaller is not required for this query, though). This recursive query does basically everything you need:WITH RECURSIVE ids (a, z) AS (SELECT 1, 3) -- enter values here
, cte AS (
SELECT t.pk, CASE i.a WHEN t.userid THEN t.partneruserid ELSE t.userid END = i.z AS found
, ARRAY[CASE i.a WHEN t.userid THEN t.partneruserid ELSE t.userid END, i.a] AS path
FROM tbl t, ids i
WHERE i.a IN (userid, partneruserid) -- constraint user_id_smaller simplifies this task
UNION ALL
SELECT t.pk
, CASE c.path[1] WHEN t.userid THEN t.partneruserid ELSE t.userid END = i.z AS found
, CASE c.path[1] WHEN t.userid THEN t.partneruserid ELSE t.userid END || c.path
FROM cte c, tbl t, ids i
WHERE c.path[1] IN (t.userid, t.partneruserid)
AND CASE c.path[1] WHEN t.userid THEN t.partneruserid ELSE t.userid END <> ALL (c.path)
AND NOT c.found
AND array_length(path,1) < 4
)
SELECT path, array_length(path,1) - 1 AS steps
FROM cte c
WHERE found;SQL Fiddle
Except for the count. It's unclear how you count duplicate connections exactly. Strictly speaking, you should multiply
ct with every step ...Explanation
The adapted table forces
userid to be the lower ID, so the initial predicate can be simplified: userid matches lower ID, partneruserid matches higher ID. If your input is parameterized, you can sort with LEAST (x,y) and GREATEST(x,y) Update: That shortcut only works for short-circuiting in a single step. We need to consider both IDs as starting point for longer paths. Query fixed.I pick the lower input value (
a) as start (could be the other way round) and walk the graph of partnerships until I reach the higher input value (z). Each step can connect to userid or partneruserid, which necessitates the CASE expressions: the match could be on either column, we need to continue with the other column respectively.We have to remember the path so far (
path) and rule out repetitions so not to walk in circles (... <> ALL (c.path)).I save the last node of the path as first element (
path[1]), just to avoid another column for that information. Either way may be more convenient / faster ...Stop traversing the graph when
z is reached (found) or when a given maximum length for the path has been exhausted (array_length(path,1) < 4).Only rows where
z is reached (WHERE found) are in the final result set. I only collect the bare minimum of information along the way. You can add more columns to the query or / and JOIN to the original table in the outer query to return more fields.Code Snippets
CREATE TEMP TABLE tbl(
pk serial PRIMARY KEY
, userid int NOT NULL
, partneruserid int NOT NULL
, ct int DEFAULT 1 NOT NULL -- this part is unclear
, CONSTRAINT user_partner_uni UNIQUE (userid, partneruserid)
, CONSTRAINT userid_smaller CHECK (userid < partneruserid)
);
INSERT INTO tbl
VALUES
(1,1,2,1)
, (2,1,3,2) -- folded
, (3,1,6,1)
, (4,2,5,1)
, (5,2,6,2) -- folded
, (7,1,4,1)
, (8,3,5,1)
, (10,3,6,1);WITH RECURSIVE ids (a, z) AS (SELECT 1, 3) -- enter values here
, cte AS (
SELECT t.pk, CASE i.a WHEN t.userid THEN t.partneruserid ELSE t.userid END = i.z AS found
, ARRAY[CASE i.a WHEN t.userid THEN t.partneruserid ELSE t.userid END, i.a] AS path
FROM tbl t, ids i
WHERE i.a IN (userid, partneruserid) -- constraint user_id_smaller simplifies this task
UNION ALL
SELECT t.pk
, CASE c.path[1] WHEN t.userid THEN t.partneruserid ELSE t.userid END = i.z AS found
, CASE c.path[1] WHEN t.userid THEN t.partneruserid ELSE t.userid END || c.path
FROM cte c, tbl t, ids i
WHERE c.path[1] IN (t.userid, t.partneruserid)
AND CASE c.path[1] WHEN t.userid THEN t.partneruserid ELSE t.userid END <> ALL (c.path)
AND NOT c.found
AND array_length(path,1) < 4
)
SELECT path, array_length(path,1) - 1 AS steps
FROM cte c
WHERE found;Context
StackExchange Database Administrators Q#118887, answer score: 2
Revisions (0)
No revisions yet.