snippetsqlMinor
How to aggregate pairs of id values into groups
Viewed 0 times
groupsintohowvaluespairsaggregate
Problem
I have a table with two id columns id1 and id2. Each row shows that 2 id values have been linked together. My actual data has both id columns as varchar(20) but I have used integers in the example below to illustrate the problem.
I want to group together ids so that all linked ids are grouped together even if there is not a direct link between 2 id values.
I would like to achieve this in sql without using cursors if possible.
All advice gratefully received.
Update: I thought it would be useful to add a bit more background on the data this represents. The data represents records for businesses sourced from different data sources
The main table looks like this
The main table can contain duplicates / similar records which need to be associated. The table with id1 and id2 columns is a result of de-duplication using fuzzy string matching (levenshtein, word matching etc) and spatial proximity matching. I think that this table is actually an undirected graph containing many disconnected sub graphs e.g. group_id 1 and 2.I want to join all ids in the same network and update the group_id in the main table.
I think that the question is now how to identify all disconnected sub graphs on a graph using SQL server Find All Disconnected Sub Graphs - Java
id1 id2
--- ---
1 2
1 3
3 4
3 5
6 7I want to group together ids so that all linked ids are grouped together even if there is not a direct link between 2 id values.
id group_id
-- ----
1 1
2 1
3 1
4 1
5 1
6 2
7 2I would like to achieve this in sql without using cursors if possible.
All advice gratefully received.
Update: I thought it would be useful to add a bit more background on the data this represents. The data represents records for businesses sourced from different data sources
The main table looks like this
id business_name postal_address business_classification x y group_idThe main table can contain duplicates / similar records which need to be associated. The table with id1 and id2 columns is a result of de-duplication using fuzzy string matching (levenshtein, word matching etc) and spatial proximity matching. I think that this table is actually an undirected graph containing many disconnected sub graphs e.g. group_id 1 and 2.I want to join all ids in the same network and update the group_id in the main table.
I think that the question is now how to identify all disconnected sub graphs on a graph using SQL server Find All Disconnected Sub Graphs - Java
Solution
Well this worked for the given sample data but not the cycles scenario / may not work for a more complex data set. Thought I would post anyway as a starter for 10:
DECLARE @t TABLE ( id1 INT, id2 INT )
INSERT INTO @t VALUES
( 1, 2 ),
( 1, 3 ),
( 3, 4 ),
( 3, 5 ),
( 6, 7 )
--( 1, 2 ),
--( 2, 3 ),
--( 3, 1 )
;WITH cte AS
(
SELECT ROW_NUMBER() OVER( ORDER BY id1 ) group_id, id1 id
FROM @t a
CROSS APPLY ( SELECT TOP 1 id2 FROM @t c WHERE a.id1 = c.id1 ) d
WHERE NOT EXISTS ( SELECT * FROM @t b WHERE a.id1 = b.id2 )
AND a.id2 = d.id2
UNION ALL
SELECT group_id, t.id2
FROM cte c
INNER JOIN @t t ON c.id = t.id1
)
SELECT *
FROM cte
ORDER BY 1, 2Code Snippets
DECLARE @t TABLE ( id1 INT, id2 INT )
INSERT INTO @t VALUES
( 1, 2 ),
( 1, 3 ),
( 3, 4 ),
( 3, 5 ),
( 6, 7 )
--( 1, 2 ),
--( 2, 3 ),
--( 3, 1 )
;WITH cte AS
(
SELECT ROW_NUMBER() OVER( ORDER BY id1 ) group_id, id1 id
FROM @t a
CROSS APPLY ( SELECT TOP 1 id2 FROM @t c WHERE a.id1 = c.id1 ) d
WHERE NOT EXISTS ( SELECT * FROM @t b WHERE a.id1 = b.id2 )
AND a.id2 = d.id2
UNION ALL
SELECT group_id, t.id2
FROM cte c
INNER JOIN @t t ON c.id = t.id1
)
SELECT *
FROM cte
ORDER BY 1, 2Context
StackExchange Database Administrators Q#61813, answer score: 3
Revisions (0)
No revisions yet.