patternMinor
Pairwise intersections
Viewed 0 times
pairwiseintersectionsstackoverflow
Problem
I have a table with two columns, let's say FirstName and LastName.
I need to get another table, which for every pair of FirstName's from the first one contains a count of the common LastName's.
Is this even feasible to do in SQL?
There are much more unique LastName's than FirstName's, if this affects efficiency of the query.
A toy example, input:
Output:
As this relation is reflexive and symmetrical, it's OK if the result is just one of the triangles (e.g, the one above the diagonal).
I need to get another table, which for every pair of FirstName's from the first one contains a count of the common LastName's.
Is this even feasible to do in SQL?
There are much more unique LastName's than FirstName's, if this affects efficiency of the query.
A toy example, input:
FirstName, LastName
John, Smith
John, Doe
Jane, DoeOutput:
FirstName1, FirstName2, CommonLastNames
John, John, 2
John, Jane, 1
Jane, Jane, 1
Jane, John, 1As this relation is reflexive and symmetrical, it's OK if the result is just one of the triangles (e.g, the one above the diagonal).
Solution
I'm going to use MS SQL Server to do this since I have a copy to hand. I believe most any of the majors would do it similarly.
First a sample table, with data. I use a table variable but it's the same for any flavour of table.
You can get all pairs by doing a self-join:
Using
Next you need something to count. This is where the
Add in a
First a sample table, with data. I use a table variable but it's the same for any flavour of table.
declare @t table (FirstName char(10), LastName char(10));
insert @t(FirstName,LastName)
values ('John','Smith'),('John','Doe'),('Jane','Doe');You can get all pairs by doing a self-join:
select
a.FirstName, a.LastName, b.FirstName, b.LastName
from @t as a
cross apply @t as b;Using
CROSS APPLY avoids having to jump through hoops to find a join condition for an ON clause.Next you need something to count. This is where the
CASE statement comes in. The case returns an integer value per pair of first names, which is what gets counted. (If I'm reading your question correctly you want where the LastNames match so that's the comparison I have. Hopefully it's obvious how to modify this if I'm wrong.)select
...
case
when a.LastName = b.LastName then 1
else 0
end
...etc.Add in a
SUM() and GROUP BY and you get your answer:select
a.FirstName,
b.FirstName,
sum(
case
when a.LastName = b.LastName then 1
else 0
end
) as CommonLastNames
from @t as a
cross apply @t as b
group by a.FirstName, b.FirstName;Code Snippets
declare @t table (FirstName char(10), LastName char(10));
insert @t(FirstName,LastName)
values ('John','Smith'),('John','Doe'),('Jane','Doe');select
a.FirstName, a.LastName, b.FirstName, b.LastName
from @t as a
cross apply @t as b;select
...
case
when a.LastName = b.LastName then 1
else 0
end
...etc.select
a.FirstName,
b.FirstName,
sum(
case
when a.LastName = b.LastName then 1
else 0
end
) as CommonLastNames
from @t as a
cross apply @t as b
group by a.FirstName, b.FirstName;Context
StackExchange Database Administrators Q#103961, answer score: 5
Revisions (0)
No revisions yet.